Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Port issue on Linux Minion #611

Closed
lanoxx opened this issue Feb 13, 2019 · 7 comments
Closed

Node Port issue on Linux Minion #611

lanoxx opened this issue Feb 13, 2019 · 7 comments

Comments

@lanoxx
Copy link
Contributor

lanoxx commented Feb 13, 2019

I have problems accessing the node port of a service running on my linux minion (ip-172-33-69-225):

root@ip-172-33-69-225:/home/ubuntu# kubectl get svc --all-namespaces
NAMESPACE     NAME                   TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes             ClusterIP   10.96.0.1     <none>        443/TCP                  45h
default       win-webserver          NodePort    10.96.0.94    <none>        80:32114/TCP             23h
kube-system   kube-dns               ClusterIP   10.96.0.10    <none>        53/UDP,53/TCP,9153/TCP   45h
kube-system   kubernetes-dashboard   NodePort    10.96.0.207   <none>        80:32306/TCP             143m
root@ip-172-33-69-225:/home/ubuntu# curl 10.96.0.207:32306 #this just hangs for ever
^C
root@ip-172-33-69-225:/home/ubuntu# curl 172.33.69.225:32306
curl: (7) Failed to connect to 172.33.69.225 port 32306: Connection refused

How can I debug this problem? I checked all the logs but could not find any issue.

@girishmg
Copy link
Member

What is the output of ovn-nbctl lb-list?
Are the Core DNS PODs up and running?
What is the git commit ID of the ovn-k8s in your deployment?

@shettyg
Copy link
Collaborator

shettyg commented Feb 13, 2019

You start with:

On master:

  1. ovn-nbctl list load-balancer
  2. Look for the record for the node in question (It should be something like GR_$nodename)
  3. See if the NodePort entry that you were looking for exists.

On Node:

  1. Dump the flows of the physical OVS bridge (ovs-ofctl dump-flows $bridgename). You should see an entry for the nodeport in question
  2. Make sure that ovn-controller is running on that node.
  3. Make sure that the chassis in question is registered in SB DB

OVS

  1. Does the OVS kernel module on that node support NAT?
  • lsmod | grep openvswitch
  1. Does ovs-vswitchd.log in /var/log/openvswitch show errors for NAT?
  2. What version of OVS? ovs-appctl version

Logs:

  1. ovnkube log for obvious errors
  2. ovn-northd log for obvious errors
  3. ovn-controller log for obvious errors

@lanoxx
Copy link
Contributor Author

lanoxx commented Feb 15, 2019

TL;DR: I came to realize that the node port is only accessible from outside the respective machine. The node ports on my linux node actually work when accessed outside the machine. But on the Windows node only one of two node ports is accessible.

@girishmg Yes, core DNS pods are up and running.

The output of ovn-nbctl lb-list is:

UUID                                    LB                  PROTO      VIP                    IPs
c8c93c0b-f9d1-426e-853d-108a7c630681                        udp        10.96.0.10:53          10.245.0.3:53,10.245.0.4:53
b35f250f-959c-4df5-872e-3583ab8030ad                        (null)     172.33.69.225:32114    10.245.1.5:80,10.245.1.6:80
                                                            (null)     172.33.69.225:32306    10.245.0.6:8080
e88d5884-086d-47f6-8cf5-763d83c531c9                        tcp        10.96.0.10:53          10.245.0.3:53,10.245.0.4:53
                                                            tcp        10.96.0.10:9153        10.245.0.3:9153,10.245.0.4:9153
                                                            tcp        10.96.0.1:443          172.33.75.8:443
                                                            tcp        10.96.0.207:80         10.245.0.6:8080
                                                            tcp        10.96.0.94:80          10.245.1.5:80,10.245.1.6:80
51767c32-ed31-4820-8278-747fc9ad69eb                        (null)     172.33.66.23:32114     10.245.1.5:80,10.245.1.6:80
                                                            (null)     172.33.66.23:32306     10.245.0.6:8080

I am not sure where to lookup the commit id, it seems the build folder is under /tmp which probably gets cleaned when I reboot the machine. But according to the ubuntu.yml it should use the master branch of the ovn-kubernetes repository to build and I had setup a fresh cluster when I posted the issue two days ago, so it must have been the latest commit.

@shettyg When I run ovn-nbctl list load-balancer it includes the following output:

_uuid               : b35f250f-959c-4df5-872e-3583ab8030ad
external_ids        : {TCP_lb_gateway_router="GR_ip-172-33-69-225.eu-central-1.compute.internal"}
name                : ""
protocol            : []
vips                : {"172.33.69.225:32114"="10.245.1.5:80,10.245.1.6:80", "172.33.69.225:32306"="10.245.0.6:8080"}

On further debugging I realized, that I cannot access the node port from the machine itself (connection refused). However, when I am on the master (172.33.75.8) I can access 172.33.69.225:32306 (my linux node), but not 172.33.66.23:32306 (my windows node). But puzzles me a bit is the fact that two days ago my Load Balancer Target group was showing the linux node as unhealthy and not it shows it as healthy (though I did not change anything). Its possible that I did not give the load balancer enough time to perform the target health checks.

Executing ovs-ofctl dump-flows brens5 on the linux node gives the output:

cookie=0x0, duration=156343.149s, table=0, n_packets=1294930, n_bytes=168832046, priority=100,ip,in_port="k8s-patch-brens" actions=ct(commit,zone=64000),output:ens5
cookie=0x0, duration=156343.146s, table=0, n_packets=1936735, n_bytes=986813654, priority=50,ip,in_port=ens5 actions=ct(table=1,zone=64000)
cookie=0x0, duration=156343.094s, table=0, n_packets=0, n_bytes=0, priority=100,tcp,in_port=ens5,tp_dst=32114 actions=output:"k8s-patch-brens"
cookie=0x0, duration=156343.091s, table=0, n_packets=312727, n_bytes=23798709, priority=100,tcp,in_port=ens5,tp_dst=32306 actions=output:"k8s-patch-brens"
cookie=0x0, duration=156344.761s, table=0, n_packets=755053, n_bytes=75613009, priority=0 actions=NORMAL
cookie=0x0, duration=156343.143s, table=1, n_packets=1193981, n_bytes=559047604, priority=100,ct_state=+est+trk actions=output:"k8s-patch-brens"
cookie=0x0, duration=156343.137s, table=1, n_packets=0, n_bytes=0, priority=100,ct_state=+rel+trk actions=output:"k8s-patch-brens"
cookie=0x0, duration=156343.133s, table=1, n_packets=742810, n_bytes=427769074, priority=0 actions=LOCAL

I can see both service ports in the output (e.g. 32306 and 32114).

On the Windows node I could not find a line for port 32306 but there is one for 32114:

cookie=0x0, duration=169428.668s, table=0, n_packets=0, n_bytes=0, priority=100,tcp,in_port="Ethernet 2",tp_dst=32114 actions=output:"k8s-patch-vEthe"
cookie=0x0, duration=169428.977s, table=0, n_packets=0, n_bytes=0, priority=100,ip,in_port="k8s-patch-vEthe" actions=ct(commit,zone=64000),output:"Ethernet 2"
cookie=0x0, duration=169428.942s, table=0, n_packets=942559, n_bytes=408243687, priority=50,ip,in_port="Ethernet 2" actions=ct(table=1,zone=64000)
cookie=0x0, duration=169446.587s, table=0, n_packets=809667, n_bytes=82447974, priority=0 actions=NORMAL
cookie=0x0, duration=169428.910s, table=1, n_packets=0, n_bytes=0, priority=100,ct_state=+est+trk actions=output:"k8s-patch-vEthe"
cookie=0x0, duration=169428.874s, table=1, n_packets=0, n_bytes=0, priority=100,ct_state=+rel+trk actions=output:"k8s-patch-vEthe"
cookie=0x0, duration=169428.841s, table=1, n_packets=942559, n_bytes=408243687, priority=0 actions=LOCAL

The chassis for both linux and windows nodes are correctly registered in SB DB and ovn-controller is running on both linux and windows nodes.

I am not sure where I can find the log files on the Windows machine, In the C:\kubernetes\ folder I only see ovn-kubernetes-node.log and the kubelet.log.

For reference here is the output of all services:

NAMESPACE     NAME                   TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
default       kubernetes             ClusterIP   10.96.0.1     <none>        443/TCP                  3d17h   <none>
default       win-webserver          NodePort    10.96.0.94    <none>        80:32114/TCP             2d19h   app=win-webserver
kube-system   kube-dns               ClusterIP   10.96.0.10    <none>        53/UDP,53/TCP,9153/TCP   3d17h   k8s-app=kube-dns
kube-system   kubernetes-dashboard   NodePort    10.96.0.207   <none>        80:32306/TCP             46h     k8s-app=kubernetes-dashboard

@girishmg
Copy link
Member

TL;DR: I came to realize that the node port is only accessible from outside the respective machine. The node ports on my linux node actually work when accessed outside the machine.

I think this behavior is expected. That is, you cannot access a service using <k8s_node_ip:node_port> from the k8s node itself. On every K8s node we create a management port that provides access to all the PODs from the node. We then add the following routes.

# ip ro
10.0.0.0/20 dev eth0 proto kernel scope link src 10.0.1.16 
10.96.0.0/12 via 192.168.2.1 dev k8s-node1
192.168.0.0/16 via 192.168.2.1 dev k8s-node1 
192.168.2.0/24 dev k8s-node1 proto kernel scope link src 192.168.2.2 

192.168.2.2 is the management port (on br-int) and 192.168.2.0/24 is the overlay subnet.
10.96.0.0/12 is the cluster service subnet. 10.0.1.16 is the k8s node host IP.

Say, we have this service:

# kubectl describe svc web
Name:                     web
Type:                     NodePort
IP:                       10.111.108.32
Port:                     http  80/TCP
TargetPort:               5000/TCP
NodePort:                 http  30224/TCP
Endpoints:                192.168.0.4:5000,192.168.1.4:5000,192.168.2.4:5000

When you now try to access 10.0.1.16:30224, because of routing rules, the packet is forwarded to the host stack directly. This fails because there is no TCP port listening on port 30224 and hence you get connection refused.

# nc -zv 10.0.1.16 30224
nc: connect to 10.0.1.16 port 30224 (tcp) failed: Connection refused

However, if you access the Nodeport from within a pod it works.

# nsenter -t 68648 -n nc -zv 10.0.1.16 30224
Connection to 10.0.1.16 30224 port [tcp/*] succeeded!

So, within a pod this succeeds and a tcpdump tells you why

192.168.0.3.35792 > 10.0.1.8.30224: Flags [S], 
100.64.1.3.35792 > 192.168.0.4.5000: Flags [S],
192.168.0.4.5000 > 100.64.1.3.35792: Flags [S.]
10.0.1.8.30224 > 192.168.0.3.35792: Flags [S.],
192.168.0.3.35792 > 10.0.1.8.30224: Flags [.], 

100.64.1.3 is due to lb_force_snat_ip on the L3 Gateway router on the k8s node.

@lanoxx
Copy link
Contributor Author

lanoxx commented Feb 25, 2019

@girishmg Thanks for these explanations. I understand now why the port is not available from the host itself. But what I still do not understand is why the Dashboard service port is not accessible on the Windows machine. As I understand services, a service port should be reachable on any node no matter where the service is actually running. If the service does not run on the node itself, incoming packets would be routed to the correct machine. At least thats how I understand it.

But for some reason, my dashboard services which runs on the Linux minion and has service port 32306 assigned is not exposed with a corresponding port on the Windows machine? While the windows service which runs on port 32114 does have a corresponding port on the Linux machine.

@girishmg
Copy link
Member

As I understand services, a service port should be reachable on any node no matter where the service is actually running. If the service does not run on the node itself, incoming packets would be routed to the correct machine. At least thats how I understand it.

Your understanding above is correct. That said, I am not sure why on the Windows machine you are not able to see the ports on Windows machine. I haven't setup a K8s cluster with Windows as one of the nodes, so I don't know.

@lanoxx
Copy link
Contributor Author

lanoxx commented Mar 18, 2019

I have reexecuted the ansible scripts on my cluster today and rebooted the nodes and now all NodePorts are accessible, both from Windows and Linux and I can also access a Linux Pod's Node Port on Windows and vice versa.

Either there was a temporary glitch on my setup, or some recent commits have resolved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants