Error setting up network for a pod defined as manifest #55

slaws · 2015-09-29T23:25:28Z

Hello,

It seems I hit an issue with a pods defined as a manifests. The very first time (at least) a pod is started as a manifest, networking setup seems to stop before applying a profile. The container is not reachable from another host.

Some detail on my setup :
Calico-kubernetes plugin version : 0.2.0
calicoctl version : 0.7.0
Pool configured : 172.17.0.0/16 (ipip with workaround from calico-docker issue #426)
Node IP is 192.168.200.6
Master IP is 192.168.200.2

After a "fresh install" here is logs from the plugin : https://gist.github.com/slaws/995ae34856b6f8d8ddf0
Then a reboot : https://gist.github.com/slaws/bbb67679ba978c6000e3
And logs after a docker kill on the pod : https://gist.github.com/slaws/7bef22b5ae712c5fa756

After the docker kill, I can reach containers from another host.

I'll join logs from the kubelet (with --v=5) ASAP.

After talking with

slaws · 2015-09-30T09:43:09Z

I think this is the revelent part of kubelet log :

Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.598897     452 exec.go:130] SetUpPod 'exec' network plugin output: Traceback (most recent call last):
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 737, in <module>
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 79, in create
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 149, in _configure_profile
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 434, in _get_pod_config
Sep 30 09:24:06 kube-node-0 kubelet[452]: KeyError: 'Pod not found: fluentd-elasticsearch-kube-node-0'
Sep 30 09:24:06 kube-node-0 kubelet[452]: , exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: E0930 09:24:06.602918     452 manager.go:1557] Failed to create pod infra container: exit status 255; Skipping pod "fluentd-elasticsearch-kube-node-0_kube-system"
Sep 30 09:24:06 kube-node-0 kubelet[452]: E0930 09:24:06.602973     452 pod_workers.go:111] Error syncing pod 3f414d923095ffb72919d37e3a73ec70, skipping: exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.603031     452 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"fluentd-elasticsearch-kube-node-0", UID:"3f414d923095ffb72919d37e3a73ec70", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): reason: 'failedSync' Error syncing pod, skipping: exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.678445     452 docker.go:347] Docker Container: /calico-node is not managed by kubelet.

When an error occurs on the setup phase, the pod seems to stay up but without network configuration.

caseydavenport · 2015-09-30T17:11:01Z

@slaws Thanks for raising this. I'll be taking a look at this issue today and I'll post my findings.

My initial hunch is that it is a either a race condition when creating pods via an on disk manifest, or potentially a subtle bug in the Kubelet's network plugin API, but I'll be able to say for sure after some more digging.

caseydavenport · 2015-10-02T16:59:43Z

@slaws - It looks to me like this is an ordering problem in the Kubelet where the Kubernetes API is not informed of this pod until after the network plugin is called.

I've raised kubernetes/kubernetes#14992 to address this issue.

slaws · 2015-10-02T20:19:47Z

@caseydavenport thanks a lot for this feedback.
I was thinking about it : since manifests are directly started by kubelet and not through the API like other pods, apiserver may not be informed of the pod creation if there is any latency.

If this is the cause a potential workaround would be to use the experimental DaemonSet. I'll try to test this next week.

caseydavenport · 2015-11-07T00:45:37Z

@alexhersh has submitted a fix for this in kubernetes/kubernetes#16894

caseydavenport · 2015-11-20T21:12:39Z

@slaws - The fix for this has been merged to Kubernetes upstream - hopefully it will be included in the next bugfix release.

If you'd like, you can try it out using a master build of Kubernetes. I'm going to close this issue for now.

Symmetric assigned caseydavenport Sep 30, 2015

Zogg mentioned this issue Nov 9, 2015

[WIP] Kubernetes integration mantl/mantl#794

Merged

caseydavenport closed this as completed Nov 20, 2015

stevendborrelli mentioned this issue Nov 26, 2015

Calico networking CiscoCloud/kubernetes-ansible#96

Open

Zogg mentioned this issue Dec 11, 2015

Calico support in Kubernetes mantl/mantl#837

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error setting up network for a pod defined as manifest #55

Error setting up network for a pod defined as manifest #55

slaws commented Sep 29, 2015

slaws commented Sep 30, 2015

caseydavenport commented Sep 30, 2015

caseydavenport commented Oct 2, 2015

slaws commented Oct 2, 2015

caseydavenport commented Nov 7, 2015

caseydavenport commented Nov 20, 2015

Error setting up network for a pod defined as manifest #55

Error setting up network for a pod defined as manifest #55

Comments

slaws commented Sep 29, 2015

slaws commented Sep 30, 2015

caseydavenport commented Sep 30, 2015

caseydavenport commented Oct 2, 2015

slaws commented Oct 2, 2015

caseydavenport commented Nov 7, 2015

caseydavenport commented Nov 20, 2015