Calico + KIND pods unable to communicate externally #2962

sager-tech · 2019-10-27T03:31:47Z

Expected Behavior

Deploy KIND
Deploy Calico
See pods come up successfully and have coreDNS pods be able to dig, ping successfully

Current Behavior

Any new pods deployed are not able to shift into ready state successfully and the coreDNS pods are not able to communicate externally via ping, or dig.

Steps to Reproduce (for bugs)

Deploy KIND
Deploy Calico
Deploy another pod

Logs

 [ERROR] plugin/errors: 2 7893400373289152203.6025455479212086695. HINFO: unreachable backend: read udp 10.244.1.3:38701->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 7893400373289152203.6025455479212086695. HINFO: unreachable backend: read udp 10.244.1.3:47810->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 7893400373289152203.6025455479212086695. HINFO: unreachable backend: read udp 10.244.1.3:54801->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 7893400373289152203.6025455479212086695. HINFO: unreachable backend: read udp 10.244.1.3:45085->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 7893400373289152203.6025455479212086695. HINFO: unreachable backend: read udp 10.244.1.3:34876->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 amazon.com. A: unreachable backend: read udp 10.244.1.3:58544->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 amazon.com. A: unreachable backend: read udp 10.244.1.3:45441->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 amazon.com. A: unreachable backend: read udp 10.244.1.3:51907->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 amazon.com. A: unreachable backend: read udp 10.244.1.3:36537->192.168.65.1:53: i/o timeout
 [ERROR] plugin/errors: 2 amazon.com. A: unreachable backend: read udp 10.244.1.3:49806->192.168.65.1:53: i/o timeout

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> -t A +tries=5 +retry=5 +time=1 amazon.com
;; global options: +cmd
;; connection timed out; no servers could be reached

Your Environment

Calico version: Attempted with 3.0, 3.2, 3.3, 3.10, master
Orchestrator version (e.g. kubernetes, mesos, rkt): {Major:"1", Minor:"14", GitVersion:"v1.14.3"}
Operating System and version: darwin

The text was updated successfully, but these errors were encountered:

tmjd · 2019-10-28T19:53:33Z

@song-jiang or @neiljerram I think you've both been using KIND recently, do either of you have any suggestions or tricks for making this work. Or maybe this use case is too different from what you've been doing.

nelljerram · 2019-10-28T19:56:47Z

Can the KIND nodes communicate externally? (E.g. docker exec kind-worker apt-get update)

If no: it's not a Calico problem then.

If yes: please check that NatOutgoing is enabled in your IP pool.

nelljerram · 2019-10-28T19:58:28Z

Oh hang on, I think it might just be /etc/resolv.conf. Our recent KIND work has this:

    # Fix /etc/resolv.conf in each node.
    ${KIND} get nodes | xargs -n1 -I {} docker exec {} sh -c "echo nameserver 8.8.8.8 > /etc/resolv.conf"

sager-tech · 2019-10-28T21:46:59Z

@neiljerram I ran docker exec kind-worker apt-get update and was able to see it trying to update apt-get. How would I enable NatOutgoing in the IP pool?

As an update, I deployed Calico v3.0 with Kubernetes-API-Datastore and it is able to bring up the deployment and the coredns pods without the nameserver or DNS resolve issues. Any new pods I deploy in a deployment however are not able to go into ready state successfully, always stuck in CrashLoopBackOff. There are no logs, the only indicator I see in the description of the pod is:

Warning  BackOff    77s (x25 over 6m26s)   kubelet, kind-worker  Back-off restarting failed container

Any ideas as to why this is happening?

tmjd · 2019-10-29T15:01:41Z

I thought we had a doc for editing IP Pools but I'm not finding it. You should be able to change NATOutgoing by using calicoctl to get your IP Pool, updated it and then apply your changes.

When you query the logs for a pod you might want to try -p to see the logs from a previous run, I think that has helped me before. You could also try looking the kubelet logs to see if there is anything useful there but it kind of sounds like the pod itself is exiting so I am doubting the kubelet logs will have anything useful.

sager-tech · 2019-10-29T15:56:56Z

@tmjd I did this deployment using kind as:

kind create cluster --config `find . -name deployment.yaml` --image kindest/node:v1.13.7

with deployment.yaml as:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
  disableDefaultCNI: True
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  metadata:
    name: config
  networking:
    serviceSubnet: "10.96.0.1/12"
    podSubnet: "192.168.0.0/16"

I do have calicoctl, but I do not see anything in the docs about configuring calico to point to an existing deployment. Is that possible?

nelljerram · 2019-10-29T21:58:19Z

@sager-tech

I do not see anything in the docs about configuring calico to point to an existing deployment. Is that possible?

Yes, please see https://docs.projectcalico.org/v3.10/getting-started/calicoctl/configure/kdd. If you are using KDD, you probably just need

export DATASTORE_TYPE=kubernetes
export KUBECONFIG=<path to your kubeconfig file>

and then calicoctl should connect.

nelljerram · 2019-10-29T22:06:08Z

@sager-tech Also, stepping back to your reported problem...

Please try to distinguish between problems with name resolution (aka DNS) and IP reachability. If ping 8.8.8.8 works, but not ping google.com, it's a name resolution problem. In that case, look at the /etc/resolv.conf in the place (i.e. host or pod) that you're pinging from.

If you can ping 8.8.8.8 from the host, but not from a pod, that indicates missing SNAT/MASQUERADE, aka NatOutgoing - i.e. when the ping request reaches 8.8.8.8, the ping response can't be routed back, because the source IP of the request is still that of the originating pod, which is a private IP that makes no sense to 8.8.8.8. (Actually in this case the request would have been dropped earlier because of an RPF check, but I hope you get the idea anyway.)

Hope that gives you a few ideas to look at...

sager-tech · 2019-10-29T22:20:58Z

@neiljerram I setup calicoctl using this strategy. I spent some more time and I can report the following:

This is the pared down deployment:

NAMESPACE     NAME                                         READY   STATUS             RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
default       myapp-pod                                    1/1     Running            0          33m   10.244.2.4   kind-worker          <none>           <none>
kube-system   calico-node-xq722                            2/2     Running            0          83m   172.17.0.2   kind-control-plane   <none>           <none>
kube-system   calicoctl                                    1/1     Running            0          95m   172.17.0.4   kind-worker2         <none>           <none>
kube-system   coredns-7747b9c446-w2kmh                     2/2     Running            0          24m   10.244.2.5   kind-worker          <none>           <none>
kube-system   etcd-kind-control-plane                      1/1     Running            0          98m   172.17.0.2   kind-control-plane   <none>           <none>
kube-system   kube-apiserver-kind-control-plane            1/1     Running            0          97m   172.17.0.2   kind-control-plane   <none>           <none>
kube-system   kube-controller-manager-kind-control-plane   1/1     Running            0          97m   172.17.0.2   kind-control-plane   <none>           <none>
kube-system   kube-proxy-szhjs                             1/1     Running            0          98m   172.17.0.4   kind-worker2         <none>           <none>
kube-system   kube-scheduler-kind-control-plane            1/1     Running            0          98m   172.17.0.2   kind-control-plane   <none>           <none>

From my own pod, myapp-pod I set the /etc/resolve.conf to specifically go to the coredns pod by editing the spec to:

spec:
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 10.244.1.2

and the nameserver on myapp-pod shows 10.244.1.2. I can go to the host machine of the coredns pod and I am able to successfully dig and ping. However, from the coredns pod I am not able to make any successful dig or curl connections, and the /etc/resolv.conf on the coredns pod is pointing to the IP of the host machine it is on.

I have checked the NetworkPolicy and it is currently set to enable all egress and ingress. I agree that it seems like a NatOutgoing issue - pod cannot talk to host, but host can talk to external world, but I am not sure where the resolution to that problem would live.

This is where I am currently stuck. I will look into your idea about SNAT/MASQUERADE.

nelljerram · 2019-10-29T22:37:11Z

@sager-tech I'm afraid your comments are still mixing up name resolution and IP reachability. Can you ping 8.8.8.8 from myapp-pod and the coredns pod?

sager-tech · 2019-10-30T15:35:47Z

@neiljerram You were correct about the name resolution and IP reachability mixup.

I am able to successfully ping from the host, and not the pods. Neither coredns or myapp-pod. I inspected the ipPool file and it does have natOutgoing and ipipMode enabled:

apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: 2019-10-29T20:50:50Z
    name: default-ipv4-ippool
    resourceVersion: "2039"
    uid: ca9ddcf6-fa8d-11e9-a93e-0242ac110004
  spec:
    blockSize: 26
    cidr: 192.168.0.0/16
    ipipMode: Always
    natOutgoing: true
    nodeSelector: all()
kind: IPPoolList
metadata:
  resourceVersion: "19965"

So you are correct about

that indicates missing SNAT/MASQUERADE, aka NatOutgoing

but it is enabled in the ipPool config.

I'm going through more calico docs to see if there is something in there about how to modify the iptables, because I feel it is very close to working.. Any suggestions you have would be very much appreciated!

nelljerram · 2019-10-30T15:55:22Z

Does your local network also have addresses that match 192.168.0.0/16 ? (For home networks, this is pretty common.) If so, I wonder if there is a confusion somewhere between routing to devices on your home network, and routing to pods?

nelljerram · 2019-10-30T15:59:43Z

Oh, I think the problem is that KIND's default for the pod CIDR is 10.244.0.0/16, and Calico's default is 192.168.0.0/16, and they don't match.

Can you try again with something like this to modify the CIDR in the Calico YAML:

    wget -O - https://docs.projectcalico.org/v3.9/manifests/calico.yaml | \
	sed 's,192.168.0.0/16,10.244.0.0/16,' | \
	kubectl apply -f -

sager-tech · 2019-10-30T20:53:22Z

@neiljerram I was able to have the pods come up and ping successfully, thank you. I'm a bit confused though -- in the config passed to KIND on cluster create I specified:

kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  metadata:
    name: config
  networking:
    serviceSubnet: "10.96.0.1/12"
    podSubnet: "192.168.0.0/16"

so it should have matched the calico CIDR manifest (also 192.168.0.0/16). Does setting it there not affect the KIND pod cidr?

nelljerram · 2019-10-30T22:32:55Z

Well, some of your output above definitely shows 10.244 pod addresses. So perhaps KIND missed processing that config for some reason, or another field needs setting, or something; but I'm afraid I don't know KIND that well yet.

Anyway, great that things seem to be working for you now.

sager-tech · 2019-11-07T20:58:18Z

Thanks a lot for your help! It's working now. @neiljerram

tmjd self-assigned this Oct 28, 2019

tmjd added the kind/support label Oct 28, 2019

sager-tech closed this as completed Nov 7, 2019

jayaddison mentioned this issue May 22, 2021

[kubernetes 1.21] [calico 3.18] pods cannot route traffic to external services #4542

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calico + KIND pods unable to communicate externally #2962

Calico + KIND pods unable to communicate externally #2962

sager-tech commented Oct 27, 2019

tmjd commented Oct 28, 2019

nelljerram commented Oct 28, 2019

nelljerram commented Oct 28, 2019

sager-tech commented Oct 28, 2019

tmjd commented Oct 29, 2019

sager-tech commented Oct 29, 2019 •

edited

nelljerram commented Oct 29, 2019

nelljerram commented Oct 29, 2019

sager-tech commented Oct 29, 2019

nelljerram commented Oct 29, 2019

sager-tech commented Oct 30, 2019

nelljerram commented Oct 30, 2019

nelljerram commented Oct 30, 2019

sager-tech commented Oct 30, 2019

nelljerram commented Oct 30, 2019

sager-tech commented Nov 7, 2019 •

edited

Calico + KIND pods unable to communicate externally #2962

Calico + KIND pods unable to communicate externally #2962

Comments

sager-tech commented Oct 27, 2019

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Logs

Your Environment

tmjd commented Oct 28, 2019

nelljerram commented Oct 28, 2019

nelljerram commented Oct 28, 2019

sager-tech commented Oct 28, 2019

tmjd commented Oct 29, 2019

sager-tech commented Oct 29, 2019 • edited

nelljerram commented Oct 29, 2019

nelljerram commented Oct 29, 2019

sager-tech commented Oct 29, 2019

nelljerram commented Oct 29, 2019

sager-tech commented Oct 30, 2019

nelljerram commented Oct 30, 2019

nelljerram commented Oct 30, 2019

sager-tech commented Oct 30, 2019

nelljerram commented Oct 30, 2019

sager-tech commented Nov 7, 2019 • edited

sager-tech commented Oct 29, 2019 •

edited

sager-tech commented Nov 7, 2019 •

edited