Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build-in load balancer #5134

Open
SixFive7 opened this issue Mar 15, 2022 · 9 comments
Open

Build-in load balancer #5134

SixFive7 opened this issue Mar 15, 2022 · 9 comments

Comments

@SixFive7
Copy link
Contributor

SixFive7 commented Mar 15, 2022

Feature Request

Description

Load balancers are ubiquitous in cloud environments but not standardized and manual work in on-premise setups. Hence letting Talos handle this requirement internally would relieve a significant burden in setting up on premise clusters with Talos. See #2711 for a partial rationale.

Current solutions

Currently (v0.14) the available solutions in Talos are:

  • Use an external dedicated load-balancer (service)
    • Creates a cluster dependency on another system or provider.
    • No hosted services are really available if your are not hosting all your nodes on the same cloud where you also want to use the load balancer from.
    • Self hosting (HAProxy, NGINX, F5, etc) adds additional hardware, software and maintenance requirements and is work to do right.
  • DNS records
    • Round robin DNS assumes clients re-query periodically. Most CNI components do not and stick to the first IP during startup.
    • Does not provide failover.
    • Does not provide high availability.
  • Layer 2 shared IP #3111
    • Provides failover but no load balancing.
    • High availability but (relative) slow failovers.
    • Requires promiscuous mode allowing gratuitous ARP broadcasts. Not always allowed or possible in stretched VPCs.
    • On premise multi datacenter setups require custom network setups (and some hardware) to enable layer 2 spanning.
  • Tentative: VIP over BGP
    • Can be done with MetalLB or kube-vip and some dedication.
    • No build in option as of yet. #4334
  • Tentative: IPVS
    • For now only a mention in #2711

Suggested additional solution

A better solution would be to add Talos native load balancing on the node side. This would:

  • Be a CNI agnostic solution.
  • Allow native build-in load balancing without any external software, hardware requirements or other dependencies.
  • Allow self hosting without any maintenance burden other than the already required Talos config.
  • Enable faster failovers than the current solutions.
  • Enable true high availability given at least three control planes nodes.
  • Not be dependent on ARP or BGP and thus work:
    • In every stretched VPC also those not supporting/allowing promiscuous adapters.
    • In any multi datacenter setup without a spanned layer 2 broadcast domain or BGP setup.
    • Without any extra infrastructure components like MetalLB or kube-vip
  • Allow the kube API (and possibly also the load balancer) to only be exposed to localhost?
  • Given kubespan, this would work in all of the above scenarios without even publicly exposing the API endpoint even before a CNI is online.

Possible implementation

  • Use parts of the client side load balancing code to create a Talos node side load balancer for worker and control plane nodes.
  • Bind the load balancer on every node on localhost and set the kubernetes API server to 127.0.0.1:someport
  • Optionally expose the load balancer port to the public on the control plane nodes as well?

Possible usage

Instead of running

talosctl gen config <cluster name> <cluster endpoint>

One would specify three <cluster endpoints>.
Instead of running

talosctl gen config <cluster name> <endpoint1>,<endpoint2>,<endpoint3>

This would result in the actual cluster.controlPlane.endpoint to be set to https://127.0.0.1:someport with a native local load balancer behind it balancing all requests to all three actual endpoints.
All three endpoints would of course still be DNS names so that no worker config would need to be changed if a control plane ever changes its IP.

Ecosystem

Not accidentally more people have run into this speedbump setting up an on premise cluster. In fact, there is a large thread for just this: don't require a load balancer between cluster and control plane and still be HA and some partial fixes: KEP-3037: client-go alternative services.
However, as there are many moving parts it will probably take a long time to get support for multiple API endpoints natively. Furthermore it will taken even more years before all components of all CNI's have support for this. Hence it would be better to build it into Talos now and, as a feature of Talos, remove the load balancer requirement.

As an example solution there is Rancer's RKE implementation. Their solution by default:

  • Points the kubelet to 127.0.0.1:6443.
  • Points kube-proxy to 127.0.0.1:6443.
  • Runs a (assuming static pod) nginx-proxy container on port 6443.
  • Ties in the NGINX target health checks into their ecosystem.

Some related thoughts

  • Talos node native load balancing could use the API target health checks to inform the boot process of API availability possibly reducing the many "try but fail will retry" errors on boot to a single "waiting for network connectivity to an API server" message.
  • Talos could integrate the API health checks from multiple viewpoints into the existing health checks and logging framework and report better on transient API server failures. #4088
  • Talos could intelligently load spread request. Say you have three sites with each a single API server, it could slightly prefer to send API request to the local (lower detected latency in load balancer logic) API server.
  • Maybe it's better to keep the API server variable set to a fqdn but add that fqdn to the hosts file to resolve it to localhost in case that variable gets exposed to some external services? See the workaround solution.
  • Maybe we could choose not to expose the kubernetes API by default but have the option of letting talosctl proxy that port on demand. Much like we do with kubectl when we need to connect to a port on a pod. It would reduce the attack service to just the Talos API at rest.
  • This approach might also solve any caching issues? #4470
  • Combining this approach with kubespan one could see binding the actual API server to only localhost and the kubespan adapter. Then relegating external API access to the port exposed by the load balancer and any possible firewall rules on that. #4421 #1417 #4898
  • A load balancer might provide a level of stability and flexibility removing the need for a VIP or BGP solution and their inherent complexities. #4604 #4334

Current workaround

For anybody who wants a similar solution right now you can:

Click to expand!

Instructions

  • Determine your API server fqdn but set a different port than 6443. For example https://kube.mycluster.mydomain.com:6444.
  • Add this fqdn to the host file.
    machine:
      network:
        extraHostEntries:
        - ip: 127.0.0.1
          aliases:
          - kube.mycluster.mydomain.com
    This way it always resolves to localhost on the worker and control plane nodes but not on any external services that have somehow received that fqdn config.
  • Add a static HAProxy pod to run a mini load balancer on every node.
    machine:
      files:
      - path: /etc/kubernetes/manifests/kubernetes-api-haproxy.yaml
        permissions: 0o666
        op: create
        content: |
          apiVersion: v1
          kind: Pod
          metadata:
            name: kubernetes-api-haproxy
            namespace: kube-system
          spec:
            hostNetwork: true
            containers:
              - name: kubernetes-api-haproxy
                image: haproxy
                livenessProbe:
                  httpGet:
                    host: localhost
                    path: /livez
                    port: 6445
                    scheme: HTTP
                volumeMounts:
                - name: kubernetes-api-haproxy-config
                  mountPath: /usr/local/etc/haproxy/haproxy.cfg
                  readOnly: true
            volumes:
            - name: kubernetes-api-haproxy-config
              hostPath:
                path: /etc/kubernetes/manifests/haproxy.cfg
                type: File
      - path: /etc/kubernetes/manifests/haproxy.cfg
        permissions: 0o666
        op: create
        content: |
          global
              log stdout format raw daemon
    
          defaults
              log global
              option tcplog
    
              option http-keep-alive
              timeout connect 3s
              timeout client 1h
              timeout server 1h
              timeout tunnel 1h
              timeout client-fin 1m
              timeout server-fin 1m
              retries 1
    
              email-alert mailers mailservers
              email-alert from kube1-kubernetes-api-haproxy@your.mailserver.com
              email-alert to kube1-kubernetes-api-haproxy@your.mailserver.com
              email-alert level notice
    
          mailers mailservers
              mailer yourdomaintld your.mailserver.com:25
    
          frontend kube
              mode tcp
              bind :6444
              default_backend kubes
    
          backend kubes
              mode tcp
              balance roundrobin
              option httpchk GET /readyz
              http-check expect status 200
              default-server verify none check check-ssl inter 2s fall 2 rise 2
              server kube1 kube1.mycluster.mydomain.com:6443
              server kube2 kube1.mycluster.mydomain.com:6443
              server kube3 kube1.mycluster.mydomain.com:6443
    
          frontend stats
              mode http
              bind :6445
              monitor-uri /livez
              default_backend stats
    
          backend stats
              mode http
              stats refresh 5s
              stats show-node
              stats show-legends
              stats show-modules
              stats hide-version
              stats uri /
    Note that:
    • As of v0.15/v1.0 there is now a better way to add a static pod to the Talos config.
    • The haporxy.cfg file is placed in the wrong directory resulting in an (otherwise harmless) error message on boot.
    • You could define a different email alert sender per node or not.
  • Check on every node's 6445 port for its view on API availability.
  • Do not forget to set the A records of the fqdn for external visitors to all three control plane nodes (or a load balancer running on kubernetes if you need HA from an external client as well).
@smira
Copy link
Member

smira commented Mar 15, 2022

by the way Talos runs Kubernetes control plane components pointed to the localhost:6443 for the API server endpoint, so they don't require load-balancer to be up

@edude03
Copy link

edude03 commented May 14, 2022

I'm interested in this as well. Mostly for HA control plane without an external LB.

@SixFive7
Copy link
Contributor Author

I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the Kubernetes API Server In-Cluster Load Balancer section. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included.
Good to see Talos OS growing. Exiting times!

@smira
Copy link
Member

smira commented Jul 11, 2023

I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the Kubernetes API Server In-Cluster Load Balancer section. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included. Good to see Talos OS growing. Exiting times!

This feature is in-cluster exclusively, it makes sure the cluster can run even if the external load-balancer is down (or it might prefer local traffic if the external load-balancer has higher latency).

Copy link

github-actions bot commented Jul 4, 2024

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 4, 2024
@SixFive7
Copy link
Contributor Author

SixFive7 commented Jul 4, 2024

I'd still be very much interested in a build in load balancer to replace our bespoke haproxy static pod setup.

@smira
Copy link
Member

smira commented Jul 4, 2024

Talos supports KubePrism for internal load-balancing.

@SixFive7
Copy link
Contributor Author

SixFive7 commented Jul 4, 2024

Yes, and KubePrism is excellent. I thoroughly enjoyed reading about its motivation and design.

However, we are still left with all the tradeoffs on the load balancer for our external kubernetes API access. For us this means we need to maintain a bespoke haproxy setup (implemented as static pods that run before the cluster is online) on our otherwise rather feature complete Talos cluster.

@smira
Copy link
Member

smira commented Jul 4, 2024

It just feels that external access is much more complicated, and it shouldn't be solved this way. Talos Linux and Kubernetes itself with KubePrism enabled doesn't depend on the load-balancer anymore.

External access can be provided by running load-balancers outside the cluster, using simple round-robin DNS, using a direct endpoint of the controlplane node, etc. There are many options and all of them work equally well.

I don't quite see how running haproxy on the machine itself might help here, as anyways if the machine goes down, the access will no longer be possible.

@github-actions github-actions bot removed the Stale label Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants