Multiple services with same IP announced multiple times #558

praseodym · 2020-03-23T20:21:41Z

When using two services (TCP+UDP) with the Local traffic policy, the same shared IP and the same set of pods (as described in the documentation), MetalLB in Layer 2 mode will sometimes announce the LB IP from two different nodes.

A workaround is settingexternalIPs on one of the services to the shared virtual IP, and using MetalLB on just one of the services.

This was previously reported in #530 (comment).

The text was updated successfully, but these errors were encountered:

praseodym · 2020-03-23T21:39:04Z

See https://gist.github.com/praseodym/842ab37c6a716926ef0d8e87de1a1eaa for a simple reproducer. It might need to be ran several times to hit the bug.

Relevant output:

9s          Normal    IPAllocated               service/nginx-a               Assigned IP "172.17.0.240"
7s          Normal    nodeAssigned              service/nginx-a               announcing from node "kind-worker"
5s          Normal    nodeAssigned              service/nginx-a               announcing from node "kind-worker3"
9s          Normal    IPAllocated               service/nginx-b               Assigned IP "172.17.0.240"
5s          Normal    nodeAssigned              service/nginx-b               announcing from node "kind-worker"

champtar · 2020-03-23T21:57:05Z

https://github.com/metallb/metallb/blob/main/speaker/layer2_controller.go#L78
we need to use the sharing key metallb.universe.tf/allow-shared-ip instead of the service name
(haven't looked where to fix it)

rata · 2020-03-23T22:33:46Z

Probably, yes! :) But I think we should try to reproduce first, just in case :-)

liuyuan10 · 2020-03-28T00:55:29Z

I can reproduce this bug. In L2 mode, the same IP might be shared by multiple services and announced from different node. Sharing IPs in L2 mode should be disabled before we can make sure only services with the same master node can share IPs.

rata · 2020-03-28T02:19:23Z

@liuyuan10 thanks for the confirmation! And don't hesitate to send any PRs ;)

I'd prefer to fix the bug, if possible, or maybe document the work-around until the bug is fixed (although it seems quite ugly, IIUC).

Hopefully what @champtar spotted is the cause (I don't see why not), and hashing based on the IP might give us the order we need (need to double check what that change can imply, hopefully just a new order that respects the shared IP, but haven't thought it through).

Also, this bug seems to be present for years if @champtar theory is correct (2a75be5), I don't think we need to rush to disable it right away. We can try to fix it or document a workaround, IMHO.

What do you think?

liuyuan10 · 2020-03-28T18:53:14Z

We can simply use IP in the hashing instead of service name? It should work for traffic local services given they need to use the same set of pods to share IPs. For traffic cluster services, this doesn't work because only nodes running the pods are candidates. why can't we choose all active nodes as candidate for traffic cluster services?

rata · 2020-03-28T19:15:18Z

For externalTrafficPolicy: Cluster in layer 2 mode, we can't announce from several nodes as they have different MAC address and will confuse the clients. With only one node, just doing a failover might not be so graceful (https://metallb.universe.tf/concepts/layer2/), if that is constant, I don't expect it to really work.

Or am I missing something?

liuyuan10 · 2020-03-28T19:27:04Z

I mean treat all active nodes as master selection candidates instead of just those with pod running. In this way traffic cluster services will have the same set of selection candidates and we can use node + IP to sort the candiates

champtar · 2020-03-28T19:42:03Z

@liuyuan10 if all pods are down not having the Loadbalancer is not really an issue

champtar · 2020-03-28T19:44:26Z

@rata I don't want to use the IP but use metallb.universe.tf/allow-shared-ip instead of the name when present, so we don't change the behavior at all when metallb.universe.tf/allow-shared-ip is not used

liuyuan10 · 2020-03-28T19:54:14Z

if all pods are down not having the Loadbalancer is not really an issue

It's not about pods are down or not. It's about what are the list of candidates for master selection. Different services will have different candidates (because pods runs on different nodes) and you can't use IP or metallb.universe.tf/allow-shared-ip in the hash.

I don't want to use the IP but use metallb.universe.tf/allow-shared-ip

I don't think it works. If two services have the same metallb.universe.tf/allow-shared-ip but are assigned different IPs, they should be able to run on different nodes.

rata · 2020-03-28T19:55:07Z

I mean treat all active nodes as master selection candidates instead of just those with pod running.

Which pod do you refer to? speaker? app? I guess you mean as returned by usableNodes() here: https://github.com/metallb/metallb/blob/main/speaker/layer2_controller.go#L79?

Can you please elaborate a little bit, just to avoid misunderstandings?

liuyuan10 · 2020-03-28T20:03:37Z

Yes, the usableNodes() filters the active nodes with service endpoints so only nodes with service pods running are master selection candidates. Am I getting it right?

If so, then you can't use IP + node to do hash because different services has different set of usable nodes (pods run on different nodes).

rata · 2020-03-28T20:25:06Z

I'm not familiar with the code, but I think that is correct too (if both services use externalTrafficPolicy: Cluster, if not they need to match the same exact pods and that will of course work).

The usableNodes() with endpoint that you were asking why before (after a quick search) seems to be for externalTrafficPolicy: Local to work: ae870dc (see complete PR for other changes). Seems like a simple way of having it work with Local + distributing if using Cluster (of course, it seems to fail with shared IPs, as you could reproduce this).

liuyuan10 · 2020-03-29T00:30:24Z

This is what I mean:
#562

I haven't tested it and would like to get comments.

russellb · 2021-10-27T16:13:26Z

fixed by #976

praseodym mentioned this issue Mar 23, 2020

metallb : Multiple protocol like TCP & UDP usses issue #530

Closed

rata added the bug label Mar 28, 2020

liuyuan10 mentioned this issue Mar 29, 2020

[RFC] Fix conflicting ARP when IP is shared #562

Closed

champtar mentioned this issue May 8, 2020

Metallb L2 Arp Responses flip-flopping between node MAC addresses #598

Closed

russellb added the protocol/layer2 label Jul 6, 2020

kvaps mentioned this issue Jan 18, 2021

Update admin config reference. Allow token generator to access pki admin data. aenix-io/kubefarm#1

Merged

russellb mentioned this issue Jun 16, 2021

Conflicting leader announcements from Speaker's when using a shared IP #805

Closed

This was referenced Jul 20, 2021

Fix conflicting ARP when IP is shared #922

Closed

GRPC mode for konnectivity-server aenix-io/kubernetes-in-kubernetes#9

Merged

displague mentioned this issue Sep 24, 2021

CCM produces invalid MetalLB configuration when address sharing kubernetes-sigs/cloud-provider-equinix-metal#207

Closed

russellb mentioned this issue Oct 8, 2021

L2: use externalips for speaker allocation and remove the need for a local backend in case of traffic policy cluster #976

Merged

russellb closed this as completed Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple services with same IP announced multiple times #558

Multiple services with same IP announced multiple times #558

praseodym commented Mar 23, 2020

praseodym commented Mar 23, 2020 •

edited

Loading

champtar commented Mar 23, 2020

rata commented Mar 23, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

champtar commented Mar 28, 2020

champtar commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020 •

edited

Loading

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 29, 2020

russellb commented Oct 27, 2021

Multiple services with same IP announced multiple times #558

Multiple services with same IP announced multiple times #558

Comments

praseodym commented Mar 23, 2020

praseodym commented Mar 23, 2020 • edited Loading

champtar commented Mar 23, 2020

rata commented Mar 23, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

champtar commented Mar 28, 2020

champtar commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020 • edited Loading

rata commented Mar 28, 2020

liuyuan10 commented Mar 28, 2020

rata commented Mar 28, 2020

liuyuan10 commented Mar 29, 2020

russellb commented Oct 27, 2021

praseodym commented Mar 23, 2020 •

edited

Loading

liuyuan10 commented Mar 28, 2020 •

edited

Loading