Kubernetes Core via Canonical Ubuntu localinstall DNS lookup of services does not work #286

Open
jayeshnazre opened this Issue May 12, 2017 · 9 comments

Comments

Projects
5 participants

Hi
I am new to Canonical Kubernetes, Juju and Conjure-up. But I followed the instructions to the tee to install Kubernetes-Core and everything got installed correctly on my local VM machine running Ubuntu 16.04.2. However service names are not resolving to IP, nor can I access the service cluster IP, this is internally from within the PODS, I got only one worker node as I am using the kubernetes-core. NOTE: My kube-dns pod is running and I am not able to identify whats wrong. Can someone help me with the issue. I have exposed the service as NodePort and the only way I can get to the services is via the nodeport and worker nodes IP. But I want to access the service name from within my cluster.
NOTE: All my services are in the default namespace so this is just internal service1 to service2 communication via the logical name is what I am trying to make it work (service1 and service2 are just used here as an example to explain my problem). My VM is on my VMWare workstation 12 software.

Member

ktsakalozos commented May 15, 2017

Hi @jayeshnazre

Can you help me reproduce this? I understand you start with the deployment of kubernetes-core in a localhost (lxd) based environment. With this setup all units are deployed in separate lxc containers within the VM you use. From there you deploy some services. Is it possible to share the yamls you apply or the steps you follow to deploy these services so as to reproduce your setup?

As a quick test can you deploy an ssh-demo pod and try to resolve the names of other services. For example:

juju ssh kubernetes-master/0

Inside master:

kubectl create -f https://k8s.io/docs/tasks/debug-application-cluster/shell-demo.yaml
kubectl exec -it shell-demo -- /bin/bash

Inside shell-demo:

getent hosts default-http-backend

Thanks

Thanks for responding. The steps I followed are very simple
Step 1) On VMWare workstation 12 pro. I created a VM with 8GB RAM and 40GB disk. Used Ubuntu 16.04.2 LTS (xenial)
Step 2) used the following commands to install Canonical Kubernetes-core
sudo snap install conjure-up –classic
newgrp lxd
conjure-up kubernetes-core
Step 3) Picked the self-hosting controller and localhost option

That’s it. Everything was installed successfully behind the scene but the DNS logical service name lookup does not work. Everything works via direct IP.

A few things to be aware of
a) I used custom docker images and loaded them on the worker node directly using docker load command. I got just one worker node (as I am using kubernetes-core)
b) Then I tried to use the basic docker hub (httpd:latest) image and that did not work as well when it comes to service name lookups.
Its really sad, I thought juju and conjure-up will abstract the complexity of kubernetes from me and will allow me to concentrate on using the cluster. But now I feel like its better to get my hands on working done by spanning multiple VM in VMware and manually configuring the kubernetes components - kubelets, etcd, kube-proxy etc. At least I know what I am doing. I hope you can recreate the issue at your end with the above mentioned steps. Any help will be much appreciated as I am close to giving up on using this approach of doing kubernetes multi-node hands-on.

Member

ktsakalozos commented May 16, 2017

After step 3 you have an Ubuntu VM on VMWare that hosts LXC containers running kubernetes. Inside the LXC containers is where the docker containers spawn.

You mention "DNS logical service name lookup does not work"; since there are a lot of levels of nested machines (Docker inside LXC inside a VM inside your physical host) can you please give me the command you are issuing to resolve a logical service name. Where are you issuing this command? What output were you expecting and what output did you get?

You are supposed to load docker containers using docker primitives within the kubernetes workers. Kubernetes manages the containers itself so it needs to be aware of what is running at any point in time. You should be using the pod/service/deployment yaml description files to spawn your services. In this way the KubeDNS service running within kubernetes gets populated with the right entries.

The work we are doing at Canonical in the Kubernetes distribution indeed abstracts the complexity of setting up and managing the cluster. There might be use cases we are not covering and we are willing to work on them with the community (with you). Before you go and take on the Kubernetes beast by yourself, here is what you can do: join us on IRC freenode channel #juju and ping either me (user: kjackal european time) or @chuckbutler (user lazyPower us time) so we can have a real time interaction.

Collaborator

chuckbutler commented May 31, 2017

@Cynerva I believe this is related to the discovery that LXD mandates we need to move kube-proxy from kernel-space (iptables) into user-space proxy yeah?

Contributor

Cynerva commented May 31, 2017

@chuckbutler This does sound like the same issue that setting proxy-mode=userspace fixed, yeah. I don't know if it's a universal problem in LXD deployments though, haven't tried to repro myself.

Collaborator

chuckbutler commented Jun 21, 2017

@jayeshnazre Can you confirm that by adjusting the proxy-mode flag on the kube-proxy unit files you're able to resolve DNS correctly and communicate between pods using the dns name? we've seen where service VIP's misbehave running in iptables mode.

This leads me to believe we need to expose this as a tunable option on the worker to explicitly set the proxy-mode on the worker charm, regarldess of resolution here. There are scenarios where one works and the other does not, and our sensible defaults are breaking a class of deployment.

I have the same issue using a brand new ubuntu 16.04 VM using vagrant box 'bento/ubuntu-16.04' (v2.3.7); and following https://kubernetes.io/docs/getting-started-guides/ubuntu/; with conjure-up, localhost provider, cdk with default deployment: 3 kubernetes-workers, etc..
I then follow the kubernetes tutorial at https://kubernetes.io/docs/tutorials/kubernetes-basics/:

$ kubectl run kubernetes-bootcamp --image=docker.io/jocatalin/kubernetes-bootcamp:v1 --port 8080
$ kubectl scale deployment kubernetes-bootcamp --replicas=3
$ kubectl expose deploy/kubernetes-bootcamp --port 8080
# test on each kubernetes-bootcamp pod:
$ kubectl exec -it kubernetes-bootcamp-2457653786-3r69q -- curl -vvv kubernetes-bootcamp:8080
...
 $ kubectl exec -it kubernetes-bootcamp-2457653786-tn7lg -- curl -vvv kubernetes-bootcamp:8080
* Rebuilt URL to: kubernetes-bootcamp:8080/
* Hostname was NOT found in DNS cache
* Could not resolve host: kubernetes-bootcamp
* Closing connection 0
curl: (6) Could not resolve host: kubernetes-bootcamp

In this case it fails on the pod scheduled on the same node as the one kube-dns pod.

Digging deeper with tshark (logs from an older vm):

$ tshark -i any -P &
# OK:
root@kubernetes-bootcamp-3271566451-1x5n2:/# curl kubernetes-bootcamp:8080 -vvv
- Rebuilt URL to: kubernetes-bootcamp:8080/
- Hostname was NOT found in DNS cache
-   Trying 10.152.183.138...
- Connected to kubernetes-bootcamp (10.152.183.138) port 8080 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: kubernetes-bootcamp:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Date: Mon, 10 Jul 2017 09:18:52 GMT
< Connection: keep-alive
< Transfer-Encoding: chunked
<
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-3271566451-3nlbw | v=1
- Connection #0 to host kubernetes-bootcamp left intact
root@kubernetes-bootcamp-3271566451-1x5n2:/#   1   0.000000   10.1.101.2 -> 10.152.183.10 DNS 107 Standard query 0x9ac7  A kubernetes-bootcamp.default.svc.cluster.local
  2   0.000073   10.1.101.2 -> 10.152.183.10 DNS 107 Standard query 0x8ba2  AAAA kubernetes-bootcamp.default.svc.cluster.local
  3   0.000236 10.152.183.10 -> 10.1.101.2   DNS 107 Standard query response 0x8ba2
  4   0.000419 10.152.183.10 -> 10.1.101.2   DNS 123 Standard query response 0x9ac7  A 10.152.183.138
  5   0.003775   10.1.101.2 -> 10.152.183.138 TCP 76 50230→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64869484 TSecr=0 WS=128
  6   0.003928 10.152.183.138 -> 10.1.101.2   TCP 76 8080→50230 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64869485 TSecr=64869484 WS=128
  7   0.003936   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [ACK] Seq=1 Ack=1 Win=28288 Len=0 TSval=64869485 TSecr=64869485
  8   0.003984   10.1.101.2 -> 10.152.183.138 HTTP 156 GET / HTTP/1.1
  9   0.003999 10.152.183.138 -> 10.1.101.2   TCP 68 8080→50230 [ACK] Seq=1 Ack=89 Win=28032 Len=0 TSval=64869485 TSecr=64869485
 10   0.004316 10.152.183.138 -> 10.1.101.2   TCP 249 [TCP segment of a reassembled PDU]
 11   0.004322   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [ACK] Seq=89 Ack=182 Win=29312 Len=0 TSval=64869485 TSecr=64869485
 12   0.004372 10.152.183.138 -> 10.1.101.2   TCP 110 [TCP segment of a reassembled PDU]
 13   0.004375   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [ACK] Seq=89 Ack=224 Win=29312 Len=0 TSval=64869485 TSecr=64869485
 14   0.004408 10.152.183.138 -> 10.1.101.2   HTTP 85 HTTP/1.1 200 OK  (text/plain)
 15   0.004411   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [ACK] Seq=89 Ack=241 Win=29312 Len=0 TSval=64869485 TSecr=64869485
 16   0.004834   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [FIN, ACK] Seq=89 Ack=241 Win=29312 Len=0 TSval=64869485 TSecr=64869485
 17   0.005211 10.152.183.138 -> 10.1.101.2   TCP 68 8080→50230 [FIN, ACK] Seq=241 Ack=90 Win=28032 Len=0 TSval=64869485 TSecr=64869485
 18   0.005221   10.1.101.2 -> 10.152.183.138 TCP 68 50230→8080 [ACK] Seq=90 Ack=242 Win=29312 Len=0 TSval=64869485 TSecr=64869485

# KO
root@kubernetes-bootcamp-3271566451-1x5n2:/# curl kubernetes-bootcamp:8080 -vvv
- Rebuilt URL to: kubernetes-bootcamp:8080/
- Hostname was NOT found in DNS cache
-   Trying 10.152.183.138...
 19   2.263190   10.1.101.2 -> 10.152.183.10 DNS 107 Standard query 0x7e1f  A kubernetes-bootcamp.default.svc.cluster.local
 20   2.263265   10.1.101.2 -> 10.152.183.10 DNS 107 Standard query 0x645c  AAAA kubernetes-bootcamp.default.svc.cluster.local
 21   2.263345 10.152.183.10 -> 10.1.101.2   DNS 123 Standard query response 0x7e1f  A 10.152.183.138
 22   2.263390 10.152.183.10 -> 10.1.101.2   DNS 107 Standard query response 0x645c
 23   2.267176   10.1.101.2 -> 10.152.183.138 TCP 76 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64870050 TSecr=0 WS=128
 24   2.267231   10.1.101.3 -> 10.1.101.2   TCP 76 8080→50244 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64870050 TSecr=64870050 WS=128
 25   2.267240   10.1.101.2 -> 10.1.101.3   TCP 56 50244→8080 [RST] Seq=1 Win=0 Len=0
 26   3.264707   10.1.101.2 -> 10.152.183.138 TCP 76 [TCP Retransmission] 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64870300 TSecr=0 WS=128
 27   3.264768   10.1.101.3 -> 10.1.101.2   TCP 76 [TCP Previous segment not captured] 8080→50244 [SYN, ACK] Seq=15586560 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64870300 TSecr=64870300 WS=128
 28   3.264775   10.1.101.2 -> 10.1.101.3   TCP 56 50244→8080 [RST] Seq=1 Win=0 Len=0
 29   5.010722 0a:58:0a:01:65:01 ->              ARP 44 Who has 10.1.101.2?  Tell 10.1.101.1
 30   5.010731 0a:58:0a:01:65:02 ->              ARP 44 10.1.101.2 is at 0a:58:0a:01:65:02
 31   5.267798   10.1.101.2 -> 10.152.183.138 TCP 76 [TCP Retransmission] 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64870801 TSecr=0 WS=128
 32   5.267873   10.1.101.3 -> 10.1.101.2   TCP 76 [TCP Previous segment not captured] 8080→50244 [SYN, ACK] Seq=46885077 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64870801 TSecr=64870801 WS=128
 33   5.267882   10.1.101.2 -> 10.1.101.3   TCP 56 50244→8080 [RST] Seq=1 Win=0 Len=0
 34   7.279964 0a:58:0a:01:65:02 ->              ARP 44 Who has 10.1.101.3?  Tell 10.1.101.2
 35   7.280030 0a:58:0a:01:65:03 ->              ARP 44 Who has 10.1.101.2?  Tell 10.1.101.3
 36   7.280053 0a:58:0a:01:65:02 ->              ARP 44 10.1.101.2 is at 0a:58:0a:01:65:02
 37   7.280085 0a:58:0a:01:65:03 ->              ARP 44 10.1.101.3 is at 0a:58:0a:01:65:03
 38   9.279817   10.1.101.2 -> 10.152.183.138 TCP 76 [TCP Retransmission] 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64871804 TSecr=0 WS=128
 39   9.279908   10.1.101.3 -> 10.1.101.2   TCP 76 [TCP Previous segment not captured] 8080→50244 [SYN, ACK] Seq=109573130 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64871804 TSecr=64871804 WS=128
 40   9.279916   10.1.101.2 -> 10.1.101.3   TCP 56 50244→8080 [RST] Seq=1 Win=0 Len=0
 41  17.295845   10.1.101.2 -> 10.152.183.138 TCP 76 [TCP Retransmission] 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64873808 TSecr=0 WS=128
 42  17.295928   10.1.101.3 -> 10.1.101.2   TCP 76 [TCP Previous segment not captured] 8080→50244 [SYN, ACK] Seq=234823444 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64873808 TSecr=64873808 WS=128
 43  17.295934   10.1.101.2 -> 10.1.101.3   TCP 56 50244→8080 [RST] Seq=1 Win=0 Len=0

Here the DNS query works, but the tcp connection with the VIP service doesn't:

 23   2.267176   10.1.101.2 -> 10.152.183.138 TCP 76 50244→8080 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=64870050 TSecr=0 WS=128
 24   2.267231   10.1.101.3 -> 10.1.101.2   TCP 76 8080→50244 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=64870050 TSecr=64870050 WS=128

The DNAT works for the SYN packet, but not for the response SYN-ACK: the source IP is the pod's IP, not the service VIP; it is then dropped by the network stack.
I tried to review the iptables rules, but I didn't spot any issue (and the rules are the same for each worker).

I reproduced some issues each time I re-created the vagrant VM (5-6 times now); but sometimes it's the kube-dns service that fails, sometimes the kubernetes-bootcamp one; sometimes some pods fail systematically, sometimes it fails once every ~5-10 requests...

Reproduced with channel 1.6/stable and 1.7/stable, with linux 4.4 and 4.8.

Adding proxy-mode=userspace in /var/snap/kube-proxy/current/args in the lxc kubernetes-workers does seem to resolve the issue, but iptables is the default setting; and it's not really accessible...

Should we open an issue on kubernetes? Is the iptables proxy mode expected to always work?

Contributor

Cynerva commented Jul 10, 2017

Since we've only seen reports of this on LXD deployments, I'm suspicious of some sort of conflict happening with iptables between the LXD containers. Hard to say since I can't repro.

We could update the charm to set proxy-mode=userspace on LXD by default. AFAIK it's not as performant, but as long as it still performs reasonably then I think that should be fine.

@Cynerva Is my scenario not enough for the repro? It is all running in a VM; it should be sufficiently isolated (hopefully the VM network setup and the virtualization technology should not impact internal routing; I use vagrant with virtualbox 5.0 and tested both with VM network bridged and NATed).

@wwwtyro wwwtyro added this to Bug in CDK Jan 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment