Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug ovn load balancer? #189

Closed
rajatchopra opened this issue Jan 4, 2018 · 4 comments
Closed

How to debug ovn load balancer? #189

rajatchopra opened this issue Jan 4, 2018 · 4 comments

Comments

@rajatchopra
Copy link

On a cluster with 40 nodes, the service is stored in the ovn load balancer table. But it does not work on some of the nodes. How to debug what is wrong?

[root@netdev75-2 ~]# ovn-nbctl --db="tcp:10.254.72.1:6641" find load_balancer
_uuid               : e9e9c620-ffa3-4560-b761-9f31a7c15674
external_ids        : {"k8s-cluster-lb-tcp"=yes}
name                : ""
protocol            : []
vips                : {"172.30.0.1:443"="10.254.72.1:8443", "172.30.0.1:53"="10.254.72.1:8053"}

_uuid               : 5318319a-3f38-40a9-8909-cc0166007fcd
external_ids        : {"k8s-cluster-lb-udp"=yes}
name                : ""
protocol            : udp
vips                : {"172.30.0.1:53"="10.254.72.1:8053"}
[root@netdev75-2 ~]# curl -k https://10.254.72.1:8443
{
  "paths": [
    "/api",
    "/api/v1",
...
...
[root@netdev75-2 ~]# curl -k https://172.30.0.1:443/
curl: (7) Failed to connect to 172.30.0.1 port 443: Connection timed out


@shettyg
Copy link
Collaborator

shettyg commented Jan 5, 2018

@rajatchopra

  1. And can it be reached via service IP from some of the nodes? And not from some?
    In that case, does the openvswitch kernel module loaded have NAT support?

  2. 'ovs-dpctl dump-flows' when the curl is run can tell you whether the NAT is being asked to run.

Unrelated based on the fact that you are running curl from the node, but just in case.

  1. Is 10.254.72.1 the master's IP? Or rather kube-apiserver's ip? Or is it a pod's IP? And where is gateway-node initialized. We have a bug where if gateway is on the same node as kube-apiserver, then from pod, we can't reach kube-apiserver if kube-apiserver's ip and gateway IP is same. If you start kube-apiserver with --advertise-ip of OVN mgmt port, then it works.

@rajatchopra
Copy link
Author

It can only be reached from the master where 10.254.72.1 is one of the interfaces.

The cluster does not have a gateway configured yet.

NAT module seems to be loaded

nf_nat_ipv4            16384  2 openvswitch,iptable_nat
nf_nat                 28672  5 xt_nat,openvswitch,nf_nat_ipv6,nf_nat_masquerade_ipv4,nf_nat_ipv4

dump-flows does not give much clue to me apart from saying that the address needs to be NAT'ed.

recirc_id(0x1ce7),in_port(4),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:56:84:7a),eth_type(0x0800),ipv4(src=10.128.28.3,dst=10.192.0.0/255.192.0.0,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=86,label=0/0x1)
recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(dst=172.30.0.1,frag=no), packets:0, bytes:0, used:never, actions:ct(zone=86),recirc(0x1ce5)
recirc_id(0x1ce6),in_port(4),eth(dst=00:00:00:56:84:7a),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(zone=86),recirc(0x1ce7)
recirc_id(0x1ce5),in_port(4),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=0a:00:00:00:00:03,dst=00:00:00:56:84:7a),eth_type(0x0800),ipv4(src=10.128.28.3,dst=172.30.0.1,proto=6,frag=no),tcp(src=37214,dst=443), packets:0, bytes:0, used:never, actions:ct(commit,zone=86,nat(dst=10.254.72.1:8443)),recirc(0x1ce6)

Another data point. There is another service which points to a pod. That service works from within other pods, but does not work from a host node.

@shettyg
Copy link
Collaborator

shettyg commented Jan 8, 2018

It can only be reached from the master where 10.254.72.1 is one of the interfaces.

So, without a gateway configured in a minion, this will not work. The node IP is not in the same virtual address space as the logical switch IPs. They need to exit the virtual space. Or you need to set the --advertise-ip of k8s-api-server as the local mgmt IP of OVN.

Another data point. There is another service which points to a pod. That service works from within other pods, but does not work from a host node.

The host likely does not have a route to the service IP. If your pod ips are in 192.168.0.0/16 and service IP range is 192.168.200.0/24, it will work because we add a route in the host saying that 192.168.0.0/16 is reachable from local mgmt port. In your case, that is likely not the case. So a route will have to be added.

@rajatchopra
Copy link
Author

Thanks @shettyg, this should have been obvious. Don't know what I was thinking. Closing the issue.
Summary:

  • One needs a gateway to be able to route to external network.
  • One needs to specify a routing table entry on the host for service cidr, if access to services is required from the host (use the k8s-${NODE_NAME}.. device)

stbenjam pushed a commit to stbenjam/ovn-kubernetes that referenced this issue Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants