Skip to content
This repository has been archived by the owner on Nov 8, 2019. It is now read-only.

Error setting up network for a pod defined as manifest #55

Closed
slaws opened this issue Sep 29, 2015 · 6 comments
Closed

Error setting up network for a pod defined as manifest #55

slaws opened this issue Sep 29, 2015 · 6 comments
Assignees

Comments

@slaws
Copy link

slaws commented Sep 29, 2015

Hello,

It seems I hit an issue with a pods defined as a manifests. The very first time (at least) a pod is started as a manifest, networking setup seems to stop before applying a profile. The container is not reachable from another host.

Some detail on my setup :
Calico-kubernetes plugin version : 0.2.0
calicoctl version : 0.7.0
Pool configured : 172.17.0.0/16 (ipip with workaround from calico-docker issue #426)
Node IP is 192.168.200.6
Master IP is 192.168.200.2

After a "fresh install" here is logs from the plugin : https://gist.github.com/slaws/995ae34856b6f8d8ddf0
Then a reboot : https://gist.github.com/slaws/bbb67679ba978c6000e3
And logs after a docker kill on the pod : https://gist.github.com/slaws/7bef22b5ae712c5fa756

After the docker kill, I can reach containers from another host.

I'll join logs from the kubelet (with --v=5) ASAP.

After talking with

@slaws
Copy link
Author

slaws commented Sep 30, 2015

I think this is the revelent part of kubelet log :

Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.598897     452 exec.go:130] SetUpPod 'exec' network plugin output: Traceback (most recent call last):
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 737, in <module>
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 79, in create
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 149, in _configure_profile
Sep 30 09:24:06 kube-node-0 kubelet[452]: File "<string>", line 434, in _get_pod_config
Sep 30 09:24:06 kube-node-0 kubelet[452]: KeyError: 'Pod not found: fluentd-elasticsearch-kube-node-0'
Sep 30 09:24:06 kube-node-0 kubelet[452]: , exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: E0930 09:24:06.602918     452 manager.go:1557] Failed to create pod infra container: exit status 255; Skipping pod "fluentd-elasticsearch-kube-node-0_kube-system"
Sep 30 09:24:06 kube-node-0 kubelet[452]: E0930 09:24:06.602973     452 pod_workers.go:111] Error syncing pod 3f414d923095ffb72919d37e3a73ec70, skipping: exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.603031     452 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"fluentd-elasticsearch-kube-node-0", UID:"3f414d923095ffb72919d37e3a73ec70", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): reason: 'failedSync' Error syncing pod, skipping: exit status 255
Sep 30 09:24:06 kube-node-0 kubelet[452]: I0930 09:24:06.678445     452 docker.go:347] Docker Container: /calico-node is not managed by kubelet.

When an error occurs on the setup phase, the pod seems to stay up but without network configuration.

@caseydavenport
Copy link
Member

@slaws Thanks for raising this. I'll be taking a look at this issue today and I'll post my findings.

My initial hunch is that it is a either a race condition when creating pods via an on disk manifest, or potentially a subtle bug in the Kubelet's network plugin API, but I'll be able to say for sure after some more digging.

@caseydavenport
Copy link
Member

@slaws - It looks to me like this is an ordering problem in the Kubelet where the Kubernetes API is not informed of this pod until after the network plugin is called.

I've raised kubernetes/kubernetes#14992 to address this issue.

@slaws
Copy link
Author

slaws commented Oct 2, 2015

@caseydavenport thanks a lot for this feedback.
I was thinking about it : since manifests are directly started by kubelet and not through the API like other pods, apiserver may not be informed of the pod creation if there is any latency.

If this is the cause a potential workaround would be to use the experimental DaemonSet. I'll try to test this next week.

@caseydavenport
Copy link
Member

@alexhersh has submitted a fix for this in kubernetes/kubernetes#16894

@caseydavenport
Copy link
Member

@slaws - The fix for this has been merged to Kubernetes upstream - hopefully it will be included in the next bugfix release.

If you'd like, you can try it out using a master build of Kubernetes. I'm going to close this issue for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants