Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer2 only announces when speaker nodes overlap with service pod nodes #322

Closed
michaelfig opened this issue Oct 4, 2018 · 4 comments
Closed

Comments

@michaelfig
Copy link

michaelfig commented Oct 4, 2018

Is this a bug report or a feature request?:

Bug report.

What happened:

I have a single nginx-ingress, which metallb has correctly bound to 192.168.1.50, and is scheduled on pve1:

root@pve1:~# kubectl get svc -lrelease=ingress
NAME                                    TYPE           CLUSTER-IP        EXTERNAL-IP    PORT(S)                      AGE
ingress-nginx-ingress-controller        LoadBalancer   192.168.223.203   192.168.1.50   80:30195/TCP,443:30110/TCP   20h
ingress-nginx-ingress-default-backend   ClusterIP      192.168.223.218   <none>         80/TCP                       20h
root@pve1:~# kubectl get po -owide
NAME                                                     READY   STATUS    RESTARTS   AGE    IP              NODE   NOMINATED NODE
[...]
ingress-nginx-ingress-controller-5bf5d9cf7d-2lhcv        1/1     Running   0          17m    192.168.225.5   pve1  <none>
[...]
metallb-controller-6cd5c74d64-59vjx                      1/1     Running   0          179m   192.168.226.8   pve2   <none>
metallb-speaker-ckcsd                                    1/1     Running   0          179m   192.168.1.190   pve3   <none>
metallb-speaker-k5q9m                                    1/1     Running   0          179m   192.168.1.25    pve2   <none>

I can connect just fine to this service from any machine within the cluster:

root@pve1:~# nc -vz 192.168.1.50 443
192.168.1.50: inverse host lookup failed: Host name lookup failure
(UNKNOWN) [192.168.1.50] 443 (https) open
root@pve1:~#

However, I can't connect from outside the cluster (on my local workstation):

Michael-Macbook-655:~ michael$ nc -vz 192.168.1.50 443
nc: connectx to 192.168.1.50 port 443 (tcp) failed: Operation timed out
Michael-Macbook-655:~ michael$ 

What you expected to happen:

The nc from my local workstation succeeds, just the same as within the cluster.

How to reproduce it (as minimally and precisely as possible):

Use Kubernetes 1.12.0 and metallb 0.7.3 in layer2 mode. Have a service pod run only on nodes that do not have a metallb-speaker pod scheduled.

Anything else we need to know?:

I'm using metallb in layer2 mode, and have labelled two of my three workers that exist on the same subnet (the others are in a different subnet) with:

$ kubectl label no pve2 pve3 subnet=192.168.1.0

Then, set the nodeSelector on the metallb-speaker daemonset.

$ kubectl patch ds metallb-speaker -p '{"spec":{"template":{"spec":{"nodeSelector":{"subnet":"192.168.1.0"}}}}}'

This configuration worked fine with Kubernetes 1.10 and metallb 0.6.2.

When I run "tcpdump -ni vmbr0 host 192.168.1.50" on the two nodes with the metallb-speaker running, I find they both see ARP requests when I attempt the nc from my workstation:

10:03:33.743751 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:34.750430 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:34.750435 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46
10:03:35.760556 ARP, Request who-has 192.168.1.50 tell 192.168.1.160, length 46

However, neither of the hosts respond, even though I see that responders are created in the logs:

root@pve1:~# kubectl logs  metallb-speaker-mv9xd | grep vmbr0
{"caller":"announcer.go:89","event":"createARPResponder","interface":"vmbr0","msg":"created ARP responder for interface","ts":"2018-10-04T15:22:34.026690143Z"}
{"caller":"announcer.go:98","event":"createNDPResponder","interface":"vmbr0","msg":"created NDP responder for interface","ts":"2018-10-04T15:22:34.026996242Z"}
root@pve1:~# kubectl logs  metallb-speaker-s75br | grep vmbr0
{"caller":"announcer.go:89","event":"createARPResponder","interface":"vmbr0","msg":"created ARP responder for interface","ts":"2018-10-04T15:22:25.82315491Z"}
{"caller":"announcer.go:98","event":"createNDPResponder","interface":"vmbr0","msg":"created NDP responder for interface","ts":"2018-10-04T15:22:25.82373868Z"}
root@pve1:~# 

Environment:

  • MetalLB version: 0.7.3
  • Kubernetes version: 1.12.0
  • BGP router type/version: n/a (layer2)
  • OS (e.g. from /etc/os-release): Proxmox PVE (Debian GNU/Linux 9 (stretch))
  • Kernel (e.g. uname -a): Linux pve1 4.15.18-5-pve Implement BGP add-path #1 SMP PVE 4.15.18-24 (Thu, 13 Sep 2018 09:15:10 +0200) x86_64 GNU/Linux
@michaelfig
Copy link
Author

I found a way to make it work: as long as the nginx-ingress service pod is scheduled on one of the subnet=192.168.1.0 nodes, metallb announces correctly.

If the service pod is scheduled on one of the other nodes (not running metallb-speaker), then no metallb-speaker announces the 192.168.1.50 address and external routing fails to work. I think this problem is caused by the shouldAnnounce logic, which doesn't do any announcing if the endpoint is not on one of the nodes that is running metallb-speaker:

https://github.com/google/metallb/blob/master/speaker/layer2_controller.go#L73

@michaelfig michaelfig changed the title metallb layer2 can only connect within the cluster metallb layer2 only announces when speaker nodes overlap with service pod nodes Oct 4, 2018
@michaelfig michaelfig changed the title metallb layer2 only announces when speaker nodes overlap with service pod nodes layer2 only announces when speaker nodes overlap with service pod nodes Oct 4, 2018
@danderson
Copy link
Contributor

L2 mode doesn't work correctly if you have a cluster with multiple subnets. This is a hard limitation of the network protocols, nothing I can fix. If your cluster is large enough to have multiple subnets, you will need to use BGP mode to distribute IPs.

@tbrtje
Copy link

tbrtje commented Jun 25, 2020

This issue shouldn't be closed. I cannot have metallb on all of my nodes, so I only run it on a selection. Any workload that isn't running on one of the metallb-nodes won't be announced by metallb.
This has nothing todo with subnets.

@johananl
Copy link
Member

Hello @thies226j. As @danderson said, if your requirement is to announce a service from a node which doesn't run a MetalLB speaker, you can achieve that in BGP mode with externalTrafficPolicy set to Cluster. If you'd like to see new functionality introduced into MetalLB, feel free to open an issue in which you describe the desired functionality in detail as well as the rationale for adding it to the project. Doing so will help maintainers figure out if the change makes sense and can be implemented in a reasonable manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants