layer2 only announces when speaker nodes overlap with service pod nodes #322

michaelfig · 2018-10-04T16:06:57Z

Is this a bug report or a feature request?:

Bug report.

What happened:

I have a single nginx-ingress, which metallb has correctly bound to 192.168.1.50, and is scheduled on pve1:

root@pve1:~# kubectl get svc -lrelease=ingress
NAME                                    TYPE           CLUSTER-IP        EXTERNAL-IP    PORT(S)                      AGE
ingress-nginx-ingress-controller        LoadBalancer   192.168.223.203   192.168.1.50   80:30195/TCP,443:30110/TCP   20h
ingress-nginx-ingress-default-backend   ClusterIP      192.168.223.218   <none>         80/TCP                       20h
root@pve1:~# kubectl get po -owide
NAME                                                     READY   STATUS    RESTARTS   AGE    IP              NODE   NOMINATED NODE
[...]
ingress-nginx-ingress-controller-5bf5d9cf7d-2lhcv        1/1     Running   0          17m    192.168.225.5   pve1  <none>
[...]
metallb-controller-6cd5c74d64-59vjx                      1/1     Running   0          179m   192.168.226.8   pve2   <none>
metallb-speaker-ckcsd                                    1/1     Running   0          179m   192.168.1.190   pve3   <none>
metallb-speaker-k5q9m                                    1/1     Running   0          179m   192.168.1.25    pve2   <none>

I can connect just fine to this service from any machine within the cluster:

root@pve1:~# nc -vz 192.168.1.50 443
192.168.1.50: inverse host lookup failed: Host name lookup failure
(UNKNOWN) [192.168.1.50] 443 (https) open
root@pve1:~#

However, I can't connect from outside the cluster (on my local workstation):

Michael-Macbook-655:~ michael$ nc -vz 192.168.1.50 443
nc: connectx to 192.168.1.50 port 443 (tcp) failed: Operation timed out
Michael-Macbook-655:~ michael$

What you expected to happen:

The nc from my local workstation succeeds, just the same as within the cluster.

How to reproduce it (as minimally and precisely as possible):

Use Kubernetes 1.12.0 and metallb 0.7.3 in layer2 mode. Have a service pod run only on nodes that do not have a metallb-speaker pod scheduled.

Anything else we need to know?:

I'm using metallb in layer2 mode, and have labelled two of my three workers that exist on the same subnet (the others are in a different subnet) with:

$ kubectl label no pve2 pve3 subnet=192.168.1.0

Then, set the nodeSelector on the metallb-speaker daemonset.

$ kubectl patch ds metallb-speaker -p '{"spec":{"template":{"spec":{"nodeSelector":{"subnet":"192.168.1.0"}}}}}'

This configuration worked fine with Kubernetes 1.10 and metallb 0.6.2.

When I run "tcpdump -ni vmbr0 host 192.168.1.50" on the two nodes with the metallb-speaker running, I find they both see ARP requests when I attempt the nc from my workstation:

10:03:33.743751 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:34.750430 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:34.750435 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:35.760556 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46

However, neither of the hosts respond, even though I see that responders are created in the logs:

root@pve1:~# kubectl logs  metallb-speaker-mv9xd | grep vmbr0
{"caller":"announcer.go:89","event":"createARPResponder","interface":"vmbr0","msg":"created ARP responder for interface","ts":"2018-10-04T15:22:34.026690143Z"}
{"caller":"announcer.go:98","event":"createNDPResponder","interface":"vmbr0","msg":"created NDP responder for interface","ts":"2018-10-04T15:22:34.026996242Z"}
root@pve1:~# kubectl logs  metallb-speaker-s75br | grep vmbr0
{"caller":"announcer.go:89","event":"createARPResponder","interface":"vmbr0","msg":"created ARP responder for interface","ts":"2018-10-04T15:22:25.82315491Z"}
{"caller":"announcer.go:98","event":"createNDPResponder","interface":"vmbr0","msg":"created NDP responder for interface","ts":"2018-10-04T15:22:25.82373868Z"}
root@pve1:~#

Environment:

MetalLB version: 0.7.3
Kubernetes version: 1.12.0
BGP router type/version: n/a (layer2)
OS (e.g. from /etc/os-release): Proxmox PVE (Debian GNU/Linux 9 (stretch))
Kernel (e.g. uname -a): Linux pve1 4.15.18-5-pve Implement BGP add-path #1 SMP PVE 4.15.18-24 (Thu, 13 Sep 2018 09:15:10 +0200) x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

michaelfig · 2018-10-04T18:55:53Z

I found a way to make it work: as long as the nginx-ingress service pod is scheduled on one of the subnet=192.168.1.0 nodes, metallb announces correctly.

If the service pod is scheduled on one of the other nodes (not running metallb-speaker), then no metallb-speaker announces the 192.168.1.50 address and external routing fails to work. I think this problem is caused by the shouldAnnounce logic, which doesn't do any announcing if the endpoint is not on one of the nodes that is running metallb-speaker:

https://github.com/google/metallb/blob/master/speaker/layer2_controller.go#L73

danderson · 2019-03-01T00:43:03Z

L2 mode doesn't work correctly if you have a cluster with multiple subnets. This is a hard limitation of the network protocols, nothing I can fix. If your cluster is large enough to have multiple subnets, you will need to use BGP mode to distribute IPs.

tbrtje · 2020-06-25T19:08:55Z

This issue shouldn't be closed. I cannot have metallb on all of my nodes, so I only run it on a selection. Any workload that isn't running on one of the metallb-nodes won't be announced by metallb.
This has nothing todo with subnets.

johananl · 2020-06-30T13:26:22Z

Hello @thies226j. As @danderson said, if your requirement is to announce a service from a node which doesn't run a MetalLB speaker, you can achieve that in BGP mode with externalTrafficPolicy set to Cluster. If you'd like to see new functionality introduced into MetalLB, feel free to open an issue in which you describe the desired functionality in detail as well as the rationale for adding it to the project. Doing so will help maintainers figure out if the change makes sense and can be implemented in a reasonable manner.

michaelfig changed the title ~~metallb layer2 can only connect within the cluster~~ metallb layer2 only announces when speaker nodes overlap with service pod nodes Oct 4, 2018

michaelfig changed the title ~~metallb layer2 only announces when speaker nodes overlap with service pod nodes~~ layer2 only announces when speaker nodes overlap with service pod nodes Oct 4, 2018

danderson closed this as completed Mar 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer2 only announces when speaker nodes overlap with service pod nodes #322

layer2 only announces when speaker nodes overlap with service pod nodes #322

michaelfig commented Oct 4, 2018 •

edited

Loading

michaelfig commented Oct 4, 2018

danderson commented Mar 1, 2019

tbrtje commented Jun 25, 2020

johananl commented Jun 30, 2020

layer2 only announces when speaker nodes overlap with service pod nodes #322

layer2 only announces when speaker nodes overlap with service pod nodes #322

Comments

michaelfig commented Oct 4, 2018 • edited Loading

michaelfig commented Oct 4, 2018

danderson commented Mar 1, 2019

tbrtje commented Jun 25, 2020

johananl commented Jun 30, 2020

michaelfig commented Oct 4, 2018 •

edited

Loading