Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod unable to connect to itself via service #61593

Closed
codebreach opened this issue Mar 23, 2018 · 10 comments
Closed

Pod unable to connect to itself via service #61593

codebreach opened this issue Mar 23, 2018 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@codebreach
Copy link

/kind bug

What happened:
Unable to route network requests from pod to self via service

What you expected to happen:
Network requests should work

How to reproduce it (as minimally and precisely as possible):

  • Deploy new cluster on GKE 1.9.4-gke.1 (containerOS)
  • Set up Pod1, Service1 (to Pod1)
  • bash into Pod1 and try to curl Service1
  • Expected: curl succeeds
  • Actual: curl fails (IO Timeout)

Environment:

  • Kubernetes version (use kubectl version): 1.9.4-gke.1 (also with 1.7.12-gke.1)
  • Cloud provider or hardware configuration: GCE (GKE)
  • Install tools: None - using GCE
  • Others: None - fresh install

Long story:

I have a pod P1 running on node N1 and a ClusterIP service S1 pointing to P1
I have a pod P2 running on node N1 trying to access service S1 (http://s1.default.svc.cluster.local) - this timesout
I have another pod P3 running on node N2 trying to access service S1 (same url) - this works

P2 continuously io timeouts, while P3 works fine. As a workaround I have used nodeSelector to make sure P1 is never run with P2 on the same node as a workaround.

Now I have a situation where P1 needs to access itself using S1. So pod 1 which has service 1 pointing to it has to access service 1.

Already looked at a LOT of github issues and SO and it seems i need to set hairpin mode or install CNI flannel. But it seems that these are only relevant for custom (kubeadm) deployments. I am running on GKE and can't change these...so any ideas?

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Mar 23, 2018
@codebreach
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 23, 2018
@foxyriver
Copy link
Contributor

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-cannot-reach-itself-via-service-ip
kubelet has solved this by setting --hairpin-mode. or you can execute ifconfig docker0 promisc by hand to open promisc mode of docker0 network.

@codebreach
Copy link
Author

codebreach commented Mar 24, 2018 via email

@codebreach
Copy link
Author

Doing ps auxw|grep kubelet on a node results in the following:

root      1353 10.2  1.1 681760 154952 ?       Ssl  Mar07 2506:55 /home/kubernetes/bin/kubelet --v=2 --docker-disable-shared-pid --kube-reserved=cpu=70m,memory=2331Mi --allow-privileged=true --cgroup-root=/ --cloud-provider=gce --cluster-dns=10.51.240.10 --cluster-domain=cluster.local --pod-manifest-path=/etc/kubernetes/manifests --experimental-mounter-path=/home/kubernetes/containerized_mounter/mounter --experimental-check-node-capabilities-before-mount=true --cert-dir=/var/lib/kubelet/pki/ --enable-debugging-handlers=true --bootstrap-kubeconfig=/var/lib/kubelet/bootstrap-kubeconfig --require-kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/etc/srv/kubernetes/pki/ca-certificates.crt --cni-bin-dir=/home/kubernetes/bin --network-plugin=kubenet --node-labels=beta.kubernetes.io/fluentd-ds-ready=true,cloud.google.com/gke-nodepool=highmempool --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true

Note the lack of hairpin mode in the flags.

running journalctl I get:

Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]: I0321 14:21:41.235977    1324 kubenet_linux.go:265] CNI network config set to {
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "cniVersion": "0.1.0",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "name": "kubenet",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "type": "bridge",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "bridge": "cbr0",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "mtu": 1460,
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "addIf": "eth0",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "isGateway": true,
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "ipMasq": false,
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "hairpinMode": false,
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   "ipam": {
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:     "type": "host-local",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:     "subnet": "10.48.9.0/24",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:     "gateway": "10.48.9.1",
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:     "routes": [
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:       { "dst": "0.0.0.0/0" }
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:     ]
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]:   }
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]: }
Mar 21 14:21:41 gke-cluster-1-highmempool-0e4f9272-z670 kubelet[1324]: I0321 14:21:41.239399    1324 kubelet_network.go:326] Setting Pod CIDR:  -> 10.48.9.0/24

@foxyriver do you know how to go about changing these flags on GKE? I am also discussing the same with GCP support...

@codebreach
Copy link
Author

running ifconfig docker0 promisc does not fix the issue either. still get IO timeouts when hitting service from the pod that it points to.

@foxyriver
Copy link
Contributor

foxyriver commented Mar 26, 2018

@codebreach According to the startup parameter of kubelet, you installed Kubernetes network by cin plugin. Then you should execute ifconfig cni0 promisc, because the function of docker0 is replaced by cni0 in this case.

@foxyriver
Copy link
Contributor

do you know how to go about changing these flags on GKE?

sorry, I didn't use GKE. :)

@codebreach
Copy link
Author

@foxyriver unfortunately seems to not work:

ifconfig cni0 promisc -> cni0: ERROR while getting interface flags: No such device

Output of ifconfig follows (no cni0).
All interfaces except veth*, lo and eth0 have PROMISC set.

cbr0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1460
        inet 10.48.9.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::3448:8ff:fe9d:666b  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:30:09:01  txqueuelen 1000  (Ethernet)
        RX packets 52100551  bytes 13071117917 (12.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53773907  bytes 76983331046 (71.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        RX errors 0  dropped 0  overruns 0  frame 0

docker0: flags=4355<UP,BROADCAST,PROMISC,MULTICAST>  mtu 1500
        inet 169.254.123.1  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 02:42:08:d7:de:9b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.148.0.6  netmask 255.255.255.255  broadcast 10.148.0.6
        inet6 fe80::4001:aff:fe94:6  prefixlen 64  scopeid 0x20<link>
        ether 42:01:0a:94:00:06  txqueuelen 1000  (Ethernet)
        RX packets 113624570  bytes 94681759638 (88.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 90249305  bytes 25818163662 (24.0 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 603436  bytes 58435355 (55.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 603436  bytes 58435355 (55.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth40366023: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet6 fe80::bf:11ff:fe2d:1f02  prefixlen 64  scopeid 0x20<link>
        ether 02:bf:11:2d:1f:02  txqueuelen 0  (Ethernet)
        RX packets 12924  bytes 1409684 (1.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 21068  bytes 68026272 (64.8 MiB)
        TX errors 0  dropped 66 overruns 0  carrier 0  collisions 0

veth43244706: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet6 fe80::3492:cbff:fef7:464f  prefixlen 64  scopeid 0x20<link>
        ether 36:92:cb:f7:46:4f  txqueuelen 0  (Ethernet)
        RX packets 5014  bytes 425104 (415.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18945  bytes 1437645 (1.3 MiB)
        TX errors 0  dropped 76 overruns 0  carrier 0  collisions 0

[...truncated...]

@foxyriver
Copy link
Contributor

foxyriver commented Mar 27, 2018

@codebreach whether there is a interface named kubenet in your host or not?

@codebreach
Copy link
Author

codebreach commented Mar 27, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

3 participants