Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mixed protocols in service.type=loadbalancer #23880

Closed
bprashanth opened this issue Apr 5, 2016 · 63 comments
Closed

Support mixed protocols in service.type=loadbalancer #23880

bprashanth opened this issue Apr 5, 2016 · 63 comments

Comments

@bprashanth
Copy link

@bprashanth bprashanth commented Apr 5, 2016

It should be possible to use a single IP to direct traffic to multiple protocols with a single Service of Type=Loadbalancer. We just need to:

  1. Promote ephemeral to static ip
  2. Create another forwarding rule with the same static ip, but a different protocol/port

Unfortunately we need this dance because a single forwarding rule only supports 1 protocol.

When we do this we should make sure the firewall rules are opened up for the right protocol: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L927 (that just takes the first protocol for the firewall rule).

Is there a reason we didn't do this initially?

@vsimon
Copy link
Contributor

@vsimon vsimon commented Jun 6, 2016

It would be very nice to support this.

@samuraisam
Copy link

@samuraisam samuraisam commented Jul 15, 2016

Does #24090 support this?

@therc
Copy link
Contributor

@therc therc commented Jul 15, 2016

For AWS, you can use a combination of #23495 and #26268

@samuraisam
Copy link

@samuraisam samuraisam commented Jul 18, 2016

Can it be accomplished on GCE?

@thomasbarton
Copy link

@thomasbarton thomasbarton commented Dec 23, 2016

@samuraisam I was able to work around this limitation by creating 2 separate services of type=LoadBalancer one for tcp and one for udp. Using the parameter loadBalancerIP: XXX.XXX.XXX.XXX with the same ip address for both it works. I am posting this in case anyone else runs into the same issue.

@ensonic
Copy link

@ensonic ensonic commented Mar 15, 2017

@thomasbarton Did you run it once, check what IP it got and then hard-coded the IP?

@thomasbarton
Copy link

@thomasbarton thomasbarton commented Mar 15, 2017

@ensonic On GCE you can do that. But i recommend reserve the ip address first and making it static. Then create both services with the LoadBalanceIP set.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Dec 22, 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@ensonic
Copy link

@ensonic ensonic commented Dec 22, 2017

/remove-lifecycle stale
This would still be nice to have.

@ffledgling
Copy link

@ffledgling ffledgling commented Feb 5, 2018

Bump. This would be really nice to have!

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Jun 14, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@ffledgling
Copy link

@ffledgling ffledgling commented Jun 14, 2018

/remove-lifecycle stale

@kiall
Copy link
Contributor

@kiall kiall commented Jul 25, 2018

As a very very quick hack, I simply removed the validation code that enforces only a single protocol:

diff --git a/pkg/apis/core/validation/validation.go b/pkg/apis/core/validation/validation.go
index 7050c604e5..7747d527fc 100644
--- a/pkg/apis/core/validation/validation.go
+++ b/pkg/apis/core/validation/validation.go
@@ -3714,9 +3714,6 @@ func ValidateService(service *core.Service) field.ErrorList {
                                includeProtocols.Insert(string(service.Spec.Ports[i].Protocol))
                        }
                }
-               if includeProtocols.Len() > 1 {
-                       allErrs = append(allErrs, field.Invalid(portsPath, service.Spec.Ports, "cannot create an external load balancer with mix protocols"))
-               }
        }
 
        if service.Spec.Type == core.ServiceTypeClusterIP {

After this, I rebuilt kube-apiserver, and with MetalLB, mixed protocol load balancers now work just fine:

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
mixed-protocols    LoadBalancer   172.29.12.226   172.29.32.9   53:31874/TCP,53:31874/UDP     3m

I think the best course forward is to:

  1. Remove this test
  2. In all existing LB implementations, add an equivalent of this validation. Likely emitting an event against the LB, rather than failing the creation of the resource in the first place.

Any objections? I'm happy to implement when I have a little time.

Updated: After a few minutes, I spotted this event:

Type     Reason                Age              From                             Message
----     ------                ----             ----                             -------
Normal   IPAllocated           7m               metallb-controller               Assigned IP "172.29.32.9"
Warning  PortAlreadyAllocated  1m (x3 over 7m)  portallocator-repair-controller  Port 31874 was assigned to multiple services; please recreate service

Everything still works, I suspect the portallocator-repair-controller needs an update to consider port and protocol, rather than just port.

@tuminoid
Copy link

@tuminoid tuminoid commented Sep 10, 2018

We're getting this complaint from portallocator-repair-controller on 1.10.4 also on NodePort service that exposes both TCP port and UDP port on the same NodePort.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Dec 9, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@marcoshuck
Copy link

@marcoshuck marcoshuck commented Jun 6, 2020

+1, we need this feature.

@neutrongenious
Copy link

@neutrongenious neutrongenious commented Jun 12, 2020

+1 for Metallb

@ffledgling
Copy link

@ffledgling ffledgling commented Jun 16, 2020

+1 for metallb, IP pools are often limited in bare-metal deployments and often having two different LBs per-protocol makes it especially annoying from a configuration point of view for applications that try to talk over both UDP and TCP.

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

I desperately need this on my MetalLB setup. This is needed for the DNS pod serving on both TCP and UDP port 53.

@adampl
Copy link

@adampl adampl commented Jun 24, 2020

Comments should rather go to the KEP issue: kubernetes/enhancements#1435 or the PR: kubernetes/enhancements#1438

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

I desperately need this on my MetalLB setup. This is needed for the DNS pod serving on both TCP and UDP port 53.

Actually, this workaround by creating 2 separate services with "metallb.universe.tf/allow-shared-ip: "true" " annotation works fine. Just tested it:

#23880 (comment)

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

I desperately need this on my MetalLB setup. This is needed for the DNS pod serving on both TCP and UDP port 53.

Actually, this workaround by creating 2 separate services with "metallb.universe.tf/allow-shared-ip: "true" " annotation works fine. Just tested it:

#23880 (comment)

Ok, do NOT try this workaround. It broke my MetalLB. I mean it worked fine for a few mins and now the IPs are not accessible at all. Infact, even after i removed the service and recreated the old way(by just using TCP), i still can't access the services at all even though they are showing up and running on the master. urghhhhh

@zimmertr
Copy link

@zimmertr zimmertr commented Jun 24, 2020

I've been using that strategy in production for a year without issue.

@brandond
Copy link

@brandond brandond commented Jun 24, 2020

@sarabjeetdhawan that's because you're not supposed to set it to 'true' everywhere, you're supposed to set it to a unique key for each set of services that you want to share an IP. If you set them all to the same thing ('true', for example) it will try to use a single IP for everything. This is covered in the docs.

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

@sarabjeetdhawan that's because you're not supposed to set it to 'true' everywhere, you're supposed to set it to a unique key per service that you want to share an IP. If you set all your LBs to the same thing ('true', for example) it will try to use a single IP for everything. This is covered in the docs.

hmm.. this what my yaml file looked like.. Any ideas how to fix it back so that it can atleast start working with one protocol (TCP or UDP):

kind: Service
apiVersion: v1
metadata:
  name: testdns-tcp-service
  annotations:
    metallb.universe.tf/allow-shared-ip: "true"
spec:
  selector:
    app: testdns
  type: LoadBalancer
  loadBalancerIP: 10.108.35.148
  ports:
  - name: dnstcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: webmin
    port: 10000
    protocol: TCP
    targetPort: 10000
---
kind: Service
apiVersion: v1
metadata:
  name: testdns-udp-service
  annotations:
    metallb.universe.tf/allow-shared-ip: "true"
spec:
  selector:
    app: testdns
  type: LoadBalancer
  loadBalancerIP: 10.108.35.148
  ports:
  - name: dnsudp
    port: 53
    protocol: UDP
    targetPort: 53
@miekg
Copy link

@miekg miekg commented Jun 24, 2020

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

That's a fairly baseless claim that warrants evidence.

On Wed, 24 Jun 2020, 19:50 ssd2020, @.***> wrote: I desperately need this on my MetalLB setup. This is needed for the DNS pod serving on both TCP and UDP port 53. Actually, this workaround by creating 2 separate services with " metallb.universe.tf/allow-shared-ip: "true" " annotation works fine. Just tested it: #23880 (comment) <#23880 (comment)> Ok, do NOT try this workaround. It broke my MetalLB. I mean it worked fine for a few mins and now the IPs are not accessible at all. Infact, even after i removed the service and recreated the old way(by just using TCP), i still can't access the services at all even though they are showing up and running on the master. urghhhhh — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#23880 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACWIWZXKUZARVRG6L3PDYTRYI4GXANCNFSM4CABRZ3A .

Maybe I did something wrong there as @brandond just mentioned. If I did, I am just trying to understand exactly how to revert that change and do it the right way.

@brandond
Copy link

@brandond brandond commented Jun 24, 2020

This is not the right place for a tutorial on proper use of MetalLB anyways. This belongs on a slack channel somewhere at best.

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

@sarabjeetdhawan that's because you're not supposed to set it to 'true' everywhere, you're supposed to set it to a unique key for each set of services that you want to share an IP. If you set them all to the same thing ('true', for example) it will try to use a single IP for everything. This is covered in the docs.

I understand. I am now using a completely different "key" for each service pair in this way:

e.g. for ss-dns pod:

  name: ss-dns-tcp-service
  annotations:
    metallb.universe.tf/allow-shared-ip: "ss-dns"

  name: ss-dns-udp-service
  annotations:
    metallb.universe.tf/allow-shared-ip: "ss-dns"

ss-dns-tcp-service   LoadBalancer   10.100.71.40     10.108.35.147   53:30428/TCP,10000:32653/TCP              7m6s
ss-dns-udp-service   LoadBalancer   10.100.89.48     10.108.35.147   53:30871/UDP                              7m6s

However, i just cannot reach these ports (53/10000) at all anymore. Even when I am not using this workaround (e.g. using just one TCP service):

I even rebooted my master and I see this now:

# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-06-24 13:25:08 CDT; 11min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 12128 (kubelet)
   CGroup: /system.slice/kubelet.service
           ‣ 12128 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.114824   12128 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "kubernetes-dashboard-5f7b999d65-j7dtd_kube-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0841e0216e231475099599e8a1d79b25f09f3fd53bf10240b8fcea0d520ff106"
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.122364   12128 pod_container_deletor.go:75] Container "0841e0216e231475099599e8a1d79b25f09f3fd53bf10240b8fcea0d520ff106" not found in pod's containers
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.124712   12128 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0841e0216e231475099599e8a1d79b25f09f3fd53bf10240b8fcea0d520ff106"
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.126154   12128 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "coredns-fb8b8dccf-whsnm_kube-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "f3245ddcb77ff95e11114a39bd54c15fa5298e70b8f2bd4a06b8271519e6bf2f"
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.133650   12128 pod_container_deletor.go:75] Container "f3245ddcb77ff95e11114a39bd54c15fa5298e70b8f2bd4a06b8271519e6bf2f" not found in pod's containers
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.136162   12128 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "f3245ddcb77ff95e11114a39bd54c15fa5298e70b8f2bd4a06b8271519e6bf2f"
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.137636   12128 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "coredns-fb8b8dccf-hmfwg_kube-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "9b06ae806be2850573c37802c49bc11294f92b3b469c3b3ce51a225821288561"
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.147974   12128 pod_container_deletor.go:75] Container "9b06ae806be2850573c37802c49bc11294f92b3b469c3b3ce51a225821288561" not found in pod's containers
Jun 24 13:25:17 master01 kubelet[12128]: W0624 13:25:17.151347   12128 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "9b06ae806be2850573c37802c49bc11294f92b3b469c3b3ce51a225821288561"
Jun 24 13:25:18 master01 kubelet[12128]: E0624 13:25:18.805980   12128 cadvisor_stats_provider.go:403] Partial failure issuing cadvisor.ContainerInfoV2: partial failures: ["/kubepods.slice/kubepods-pod12dd0718_8d45_11e9_854a_0050568eff87.slice/docker-b8b0073f85dcc90fe25b59a8d1e8cb1990cbb1d75432a0121075a149f12f3301.scope": RecentStats: unable to find data in memory cache]

I am in a bind. Not sure what happened there.. Any help is appreciated..

@zimmertr
Copy link

@zimmertr zimmertr commented Jun 24, 2020

Please understand that every time you leave a comment here you send a notification to dozens of people. Many of which are only subscribed to this issue to track its closure.

GitHub's Issue tracker is not a place where you should seek support for software. There are dedicated channels for this. Including a public Slack server with a channel for MetalLB.

https://kubernetes.slack.com/messages/metallb

If you struggle there, which you won't because Dave Anderson and other brilliant minds are active there, you can always fall back to the Mailing List.

https://groups.google.com/forum/#!forum/metallb-users

@sarabjeetdhawan
Copy link

@sarabjeetdhawan sarabjeetdhawan commented Jun 24, 2020

I apologize for flooding this thread with all this information. I just want to update for other folks who may get spooked by looking at my original comment about that workaround breaking my MetalLB.

By incorrectly sharing the keys between two different deployments, i broke the whole thing (Thanks @brandond for pointing that out). I had to reboot the master and nodes for it to start accepting connections on those ports again. Its working fine now.

If y'all like, I can cleanup this thread by deleting all my comments from here. Please let me know.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Sep 22, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@onedr0p
Copy link

@onedr0p onedr0p commented Sep 22, 2020

/remove-lifecycle stale

@maranmaran
Copy link

@maranmaran maranmaran commented Jan 12, 2021

/remove-lifecycle stale
This would still be nice to have.

@TBBle
Copy link

@TBBle TBBle commented Jan 12, 2021

Indeed. And #94028 implemented it as alpha for Kubernetes 1.20; it's still up to the actual Load Balancer implementations to support it, e.g., EKS/AWS needs some work.

It's possible some LB implementations support it without changes, e.g. a browse through the MetalLB source suggests this would just work, simply because it doesn't try to prevent it, and it already supported mixed protocols on the same IP allocation in aggregate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment