Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple (compatible) services to use the same loadBalancerIP #121

Closed
danderson opened this issue Dec 27, 2017 · 27 comments
Closed

Allow multiple (compatible) services to use the same loadBalancerIP #121

danderson opened this issue Dec 27, 2017 · 27 comments
Milestone

Comments

@danderson
Copy link
Contributor

danderson commented Dec 27, 2017

Is this a bug report or a feature request?:

Feature request.

What happened:

@miekg tried to set up his home DNS service on Kubernetes, with MetalLB. This requires a service that routes TCP+UDP 53 to a deployment.

Unfortunately, because of kubernetes/kubernetes#23880 , services are not allowed to have mixed protocols, because of poorly explained reasons involving GCP's implementation of load-balancing.

The workaround on GCP is: create 2 Services, one for each protocol, and make them use the same loadBalancerIP, by preallocating a static IP in GCP.

This doesn't work in MetalLB, because we have a 1:1 mapping of Service to IP. Annoyingly, MetalLB trivially "supports" multiprotocol services, so the correct solution would be to remove that constraint in the vanilla k8s control plane, but...

What you expected to happen:

MetalLB should have a way to force the allocation of the same IP to multiple services, as long as those services use disjoint sets of (proto, port).

How to reproduce it (as minimally and precisely as possible):

Create a service, pointing to whatever, serving ports tcp/53 and udp/53. Observe that k8s rejects the service. Create separate services, one for tcp/53 and one for udp/53. This time, k8s accepts it, but MetalLB allocates separate IPs. Try setting the same loadBalancerIP on both services, notice that MetalLB does not allow this.

@miekg miekg self-assigned this Dec 27, 2017
@miekg
Copy link
Contributor

miekg commented Dec 27, 2017

Seems instead of using net.IP we should use net.Addr which has a network

@danderson
Copy link
Contributor Author

net.Addr is really generic though, it could be any kind of address at all (including unix domain socket, raw socket address, ...). So it would need a bunch of extra validation to make sure the addrs make sense.

@miekg
Copy link
Contributor

miekg commented Dec 27, 2017

yep sure, but you can't ask for those from k8s :-)

@danderson
Copy link
Contributor Author

Oh, I see. Hm, yeah, that would work. The allocator lib will need a lot of changes to make that work, I think?...

We also need to figure out how to enable/disable sharing, because sharing has potentially surprising behaviors (e.g. : IP is shared between 2 services, tcp/100 and tcp/200, but then the first service is edited to also have tcp/200... What do?). So, probably should be off by default?

@miekg
Copy link
Contributor

miekg commented Dec 27, 2017

I was (narrowly) focussing on my use case a service with the same port on 2 diff. protocols. Differrent ports opens another can of worms.

@danderson
Copy link
Contributor Author

Hmm, yeah, same ports but different protocols has a nice property right now, that k8s does not allow the problematic configurations at all. That will change once the upstream bug is fixed though, so I would still like to think about how to allow the generic solution.

I think a prerequisite for the generic solution is to implement #2. Once we have admission control set up, we can reject unsafe config changes before they are applied.

@miekg
Copy link
Contributor

miekg commented Dec 28, 2017

hmmm, don't we also need to track if the new service identical enough to the old service? Pointing to the same things in the cluster, etc?

Bit more work than anticipated :/

@miekg miekg removed their assignment Dec 28, 2017
@danderson
Copy link
Contributor Author

It depends what kube-proxy programs in the dataplane on the nodes. In theory, there's no problem with pointing different service IP ports to different services, as long as kube-proxy doesn't program "default-drop" entries on the nodes.

Easy way to check this: hack up metallb (as ugly as you like) so that it sets status.loadbalancer.ingress[0].ip to the same IP for 2 different services, and then inspect iptables on the nodes to see what got programmed (iptables -L -n -v and iptables -L -n -v -t nat). Ideally, in the nat tables, we should see entries for each port of each service, which DNAT to different targets. The critical part is that there should not be any entries for the service IP, without a port selector.

@miekg miekg mentioned this issue Dec 29, 2017
@danderson danderson added this to the v0.4.0 milestone Jan 16, 2018
@fkrauthan
Copy link

I could really use the feature that different Ports can use the same IP as my VPS provider for example just assigns one public IP per VPS and does not offer any way to receive additional IPs (based on my research there are quiet a few providers that have similar restrictions for there VPS products).

In terms of conflicts the LoadBalancer creation could just be marked as failed in case of conflicts. Including on an update.

@hclaria
Copy link

hclaria commented Jan 17, 2018

Hello,

We look forward to host a CoreDNS deployment on our cluster with a LoadBalancer service handled by MetalLB.

This service would need to expose both UDP and TCP protocols on the same external IP.

"b) it doesn't work when I deploy, i.e the -tcp service overwrites the udp one" - #122
@miekg , do you mean kube-proxy's iptables rules are being overwritten by the latest service ? In that case we would need to bump that issue to the sig network or wait for a stable IPVS implementation ?

@miekg
Copy link
Contributor

miekg commented Jan 17, 2018

I've abandoned #122, as I didn't fully understood the code I was working with and the PR didn't work in the end. Mostly waiting for @danderson to do it, or guide me through it :/

This feature does require some fundamental changes in the administration of metallb, as ports become significant.

I'm the mean time I just use my UDP only load balanced CoreDNS VIP at home.

@danderson
Copy link
Contributor Author

Okay, I have a tentative design for the "UX" of this feature, and I think I also have a tentative plan for how to refactor MetalLB. If anyone is interested in this, I'd like a sanity check on what the end-user interface would look like.

In the following, when I say "port", I mean "proto+port", so tcp/53 and udp/53 are different "ports".


By default, IPs cannot be shared. There is a 1:1 mapping of service to IP. This is the least confusing default behavior for new users, and it avoids some pitfalls of IP sharing (see below).

To enable sharing of IPs, you tag services with the annotation metallb.universe.tf/allow-shared-ip: "<tag>". Services that have the same non-empty tag, and use different service ports, are allowed to share an IP address.

Note that they don't have to share an IP because they specify this tag. If you want to force them to share an IP, you should additionally use spec.loadBalancerIP to force them all onto a specific IP.

So, for the canonical example of DNS, you define 2 services that point to the same backend. One serves on tcp/53, and the other on udp/53. On both services, you set spec.loadBalancerIP: 1.2.3.4 and annotations["metallb.universe.tf/allow-shared-ip"]: "dns". With this config, MetalLB is (a) told that it must allocate 1.2.3.4 to both services, and (b) knows that this is okay, because their allow-shared-ip tags are equal.

The tagging mechanism enables more complex setups as well: by using different tag values, you can create different "groups" of IP-sharing services. For example, inside one address, pool, you could have 2 services tagged "dns", and 10 services tagged "dev". "dev" services can colocate with other "dev" services, but not with "dns" or untagged services.

To simplify configuration in cases where you want an entire address pool to share IPs as much as possible, you can also set allow-shared-ip: true on address-pools in the MetalLB config, which simply tells metallb that the empty tag ("") is valid, and so untagged services can share IPs with other untagged services. Other tag semantics stay the same, so you can still define isolated sharing pools if you want, it just flips the default.

Sharing IPs adds the possibility of conflicts. For example, if service A (tcp/80) and service B (tcp/443) share an IP, and then you update service A to add tcp/443 to its port list, now there is a collision between the services. In this case, the service that triggered the collision (service A in this example) loses its allocated IP, and MetalLB goes through the IP allocation process again from scratch, as if service A never had an IP address.

This has the property that it's always the service you changed that might switch to a different IP or fail, not the innocent other service that got colocated. However, we cannot guarantee this behavior in all situations (e.g. in some race conditions, it's possible to have conflicting services already configured when MetalLB first starts - and it doesn't know which is the "bad" service), so there is a big disclaimer: if you share IPs and you mutate service definitions, be careful because you might cause unintended address changes for any colocated service.

In future, an admission webhook can reduce the window of danger, by denying writes that would trigger an unsafe state. See #2.

In BGP mode, IP sharing might force a certain externalTrafficPolicy on the service. If all services on an IP point to the exact same pods (identical label selectors), then you can use any externalTrafficPolicy. If the selectors are different, MetalLB sets the traffic policy for all services on that IP to Cluster.

This is because BGP operates at the IP layer, a machine can either receive traffic for all services on the IP, or none. This makes the Local traffic policy impossible to use, because we can't guarantee that all services on the machine are healthy - and the alternative is to stop sending traffic to a machine when any service has an unhealthy pod, which is confusing.


A bunch of questions come out of this design:

  • Is the tag mechanism confusing? It adds an extra layer of possible configuration subtlety.
  • Is the tag mechanism even necessary? In theory, there's no way for a k8s service to know if it's being colocated with someone else, and there's no performance/other benefit to dedicated IPs.
    • Should we just have a flag to enable IP sharing on an entire address pool, and consider all services in the pool eligible for colocation?
    • The complexities of BGP externalTrafficPolicy makes me think we want a way to control specifically which services can colocate, not just that colocation is generally allowed.

On the implementation side, it's reasonably simple: change allocator internals+API to support "please allocate an IP for this PortSet, with tag X", and the controller handles the parsing of config + annotations. The speaker is pretty much unchanged, except that it needs to refcount IP announcements (previously, I could assume that controller guaranteed a 1:1 map).

Feedback very welcome, please. Is this good? Is this dumb? Other ideas?

@miekg
Copy link
Contributor

miekg commented Feb 21, 2018

First thoughts:

  • I like the proposal
  • But it feels very much bolted on because of some crappy k8s decision.

@mikioh
Copy link

mikioh commented Feb 21, 2018

Just a quick glance: it looks like the required feature is anycast IP traffic support for the load balancer.

@danderson
Copy link
Contributor Author

@miekg I agree that using this for DNS is just a crappy workaround. Unfortunately fixing the upstream bug is too much work for me, so this is the best I can do :(

However, @fkrauthan above provided a different use case for this sharing mechanism: environments where you don't have enough IPs, and using different ports is okay. I think that use case is not too hacky, so I was trying to think about that case when designing this. And the solution happens to also provide a hacky workaround for the DNS thing, but that's "accidental" :).

In that light... Does the feature still feel bolted on? Any thoughts on how I could make it feel more normal?

@mikioh I don't understand what you mean. MetalLB already effectively implements anycast when you use it in BGP mode, in the sense that it encourages the router to balance across multiple machines for a single IP. Did you mean something else?

@miekg
Copy link
Contributor

miekg commented Feb 22, 2018

My feeling with "bolted on" basically comes down to wishing that I could just specify this in the spec.Ports section in the Service definition and it "would just work". But that is an even bigger wish than the k8s kubernetes/kubernetes#23880 bug that only asks for multiple protocols

@danderson danderson removed this from the v0.4.0 milestone Feb 28, 2018
@mx-shift
Copy link

mx-shift commented Mar 8, 2018

FWIW, I ran into this issue when deploying Ubiquity Network's UniFi Controller (docker image). The access points expect the same controller IP to provide both tcp/8443 (among others) and udp/3478. The latter is for STUN while the former is how the AP communicates with the controller. There is no way to specify a separate STUN IP. My deployment seems to work for basic network access for now but the controller raises an alert about STUN being unreachable.

@danderson danderson added this to the v0.5.0 milestone Mar 8, 2018
@danderson
Copy link
Contributor Author

Thanks for the intel! I'm still annoyed that Kubernetes has an arbitrary restriction imposed by GCP, but that's not something I'm going to be able to fix overnight. In the meantime, slating this change for 0.5.0 (might end up being 0.6.0 if I make 0.5 a fast release with minimal changes)

@danderson
Copy link
Contributor Author

More thoughts as I start figuring out the implementation...

Current allocator has 3 allocation methods: Assign (give me this exact IP), AllocatePool (any IP from pool X) and Allocate (any IP).

I think these can be refactored into a single Allocate call that takes a set of constraints:

type Request struct {
  Service string
  Pool string // "" means "any"
  IP net.IP // nil means "any"
}

This has the benefit that we can start providing more data to the allocation request.

Then, for IP sharing, we need 3 more pieces of information:

  • List of ports. Used to ensure we don't set up conflicting services on the same IP.
  • Sharing key. This is the "tag" that I described in the previous design bug, this is how the user explicitly controls which services can colocate with each other.
  • Backend key. This is the stringified form of the service's Selector. We can use this to ensure that, when using the Local traffic policy, only services that point to the exact same backend can be colocated.

I'm not entirely happy with this, because it adds a lot of complexity to the allocator logic... But I'm going to play with it in my client and see what it looks like in the end.

@danderson
Copy link
Contributor Author

Hrm, doesn't look that nice so far, and doesn't solve the fundamental complexity that needs to be added to the library :(.

Need to think more about how to do this sanely...

@danderson danderson modified the milestones: v0.5.0, v0.6.0 Apr 1, 2018
@danderson
Copy link
Contributor Author

danderson commented Apr 1, 2018

I've got the core of the allocator somewhat done, opened a WIP PR to track the work publicly. #209

Still missing:

  • Add sharing-specific test cases to internal/allocator
  • Stats need to be wired back into allocator
  • More graceful assignment changes for existing assignments (allow some port/sharing key mutations)
  • Plumb the sharing logic into the rest of MetalLB
  • Refcount announcements in speaker
  • Correctly set backend key for externalTrafficPolicy=Cluster
  • Document on website
  • Document in 0.6 release notes

danderson added a commit that referenced this issue Apr 2, 2018
The new allocator doesn't export stats in this change, and
the ability to share IPs isn't plumbed out to the rest of
MetalLB yet.
@danderson
Copy link
Contributor Author

@miekg If you want to test this with metallb's master build, you can probably run CoreDNS with tcp/53 now :)

@cbarratt
Copy link

Apologies for the newcomer to k8/metallb query here, I ended up on this issue while trying to figure out if I can use metallb for the ingress and route based on URL or host. I kinda feel like this may solve the issue to some degree and host all services on the same IP but different ports, but still not 100% where I'm aiming for

@pdf
Copy link

pdf commented Apr 11, 2018

@cbarratt this is probably not the ideal forum for these queries, but I think one of the standard ingress controllers is what you're looking for - they're L7-aware and can take care of routing via hostname to your services. metallb can help get traffic to your ingress if you're not on one of the cloud providers that provides a load balancer service (like AWS/GCE/etc). I'd suggest you take a look at the ingress documentation.

@steven-sheehy
Copy link

allow-shared-ip: true on address-pools in the MetalLB config

Did this global flag make it into this pull request?

@danderson
Copy link
Contributor Author

@steven-sheehy No, it didn't. It complicated the implementation a bit, and I wasn't sure anybody cared, so i left it out. Is this a request to add it? :)

If so, I'll open a separate bug and implement it for 0.6.

@steven-sheehy
Copy link

@danderson I've been playing around with master and I think the annotation approach will suit my needs. Thanks!

fedepaol referenced this issue in fedepaol/metallb Jun 7, 2023
OCPBUGS-13678: Refactor the FRR configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants