-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: a more simple way to expose LoadBalancer Service by OpenELB #214
Comments
Hello, I'm a homelab user who is frustrated over the Klipper LB and wanted to switch to k0s but I can't find a replacement for the load balancer, and it seems like this feature would greatly help. I have a couple of ideas and I will update them once I came up, so this might get edited multiple times. Here's the features I shortlisted so far: 1. Fallback proxy modeDescriptionSometimes it is really not possible to modify iptables (for example running OpenShift) then we should fallback to using a two-way socket pipe as a workaround. This would greatly reduce performance but it will at least work in some situation. Take that as a better version of inlets I would say. ImplementationWrite a simple proxy that pipes TCP/UDP to/from external/pod world. Once the Iptables approach didn't work we will use that instead. Or the user could force the usage as such, 2. Health/Live checkingDescriptionIn a normal cloud-based load balancer there will often be load balancer liveness checking on whether or not the load balancer is accepting traffic or not. In especially to our homelab environment where one node might be dead at some point in time (well at least I'm pretty extreme since I run K8S cluster on the Internet with WireGuard as a network backbone), we will need to shift load balancer to other hosts as soon as possible. Specifically in my case, without health checking we cannot be sure about the load balancer address so external-dns cannot change the DNS record as soon as possible for my API services. I also need this because sometime UPNP holepunching would die off due to timeout (yes some UPNP support router just keep the connection as long as to some point in time and got undefined behavior after that) and this would greatly help making sure the load balancer is doing well. ImplementationAs such, we should also implement a health checker by sending TCP/UDP* packet and place the node affinity to selecting any one of the nodes other than the load balancer node. We can further detect whether our target port has application-level protocol such as HTTP/TLS/MySQL so that we can also do application-specific checking for a more accurate result. *: UDP is connectionless and hence health check may not work 3. IP status updateDescription & ImplementationThere is a field in ServiceSpec called |
Proposal: a more simple way to expose LoadBalancer Service by OpenELB
Idea
A new mode
Node Proxy
is developed for OpenELB, which has the following features:The following diagram shows the network topology of a K8s cluster with OpenELB.
The IP addresses and cluster structure shown in the previous figure are only examples. The topology is described as follows:
svc-proxy-[service-name]-[namespace]
.Here are some useful notes:
external-ip
are always on a higher deployment priority.80
or8080
suggest using Deployment to avoid collision.The relationships of these elements are described here:
The network path of this system looks like this:
The IP addresses shown in the previous figure are only examples. The procedure is described as follows:
[node-external-ip]:[service-port]
or[node-internal-ip]:[service-port]
(Users need to be in an intranet environment) to send udp/tcp packets, which later send into the proxy Pod.How 2 Use
Deploy One Proxy Pod By Deployment
Here's a LoadBalancer example file to deploy one proxy Pod by Deployment:
This creates a Deployment on namespace
porter-system
, which will deploy a proxy Pod for this LoadBalancer Service.Once the proxy Pod is successfully runnig, there's some new annotation will occur on this Service:
Deploy Proxy Pods By DaemonSet
Here's a LoadBalancer example file to deploy proxy Pods on all nodes by DaemonSet:
This creates a DaemonSet on namespace
porter-system
, which will deploy proxy Pods on all nodes for this LoadBalancer Service.Once the proxy Pods are successfully running, there's some new annotation will occur on this Service:
Specify the proxy and forwarding image by ConfigMap
By default, the proxy and forwarding image will be set as same as the OpenELB releasing version which can be seen in
Makefile
.If you want to use a customize image or to change the image version, you can create a ConfigMap in the OpenELB control namespace, which is default
porter-system
.Here's a ConfigMap sample:
Exclude Nodes For Proxy Pod Deployment
By default, the proxy Pod deployment schedule process takes all available Nodes in consideration, including Master Nodes.
If you don't want a Node to be deployed with any proxy Pod, label it with
node-proxy.porter.kubesphere.io/exclude-node
.Release Resources
Deleting the corresponding LoadBalancer Service releases resources created in this mode (Deployments, DaemonSets for proxy Pods).
Discussions
Why not just make modification on the iptables rules of Nodes?
First, this method disturbes the iptables rules of the host.
Second, the Kube-Proxy on every Node will flush iptables rule or ipvs rules at a fixed interval of time.
Third, making a Pod to make modification on host means it's privileged, which is dangerous.
A Pod with iptables rule modification consumes a small amount of cpu and memory resources, but it's worth it.
Why not just expose a Service by NodePort?
First, the ports available in NodePort mode are limited in scope (default is 30000 - 32767). Some lower ports, such as port 80, are not officially recommended for use.
Second, hot port preemption can not be solved in NodePort. You can not expose multiple Services on a single port.
The LB Service mode is created to solve these problem.
Why not just set exposed Node ips on
external-ip
of LoadBalancer Service?If you set the external-ip of the LoadBalancer Service, the Kube-Proxy will modify iptables or ipvs rules on all Nodes. This is very dangerous because it disturbes the cluster network environment.
After much deliberation, I had to settle for a painful compromise.
The external-ip of the LoadBalancer Service will always stay <Pending> in this mode, and the information of exposure is displayed in annotation.
Differences from klipper-lb in K3s
No privilege to takes effect in more cases
Forwarding packets in unix-like systems is disabled by default for security. To open it, you needs to set
/proc/sys/net/ipv4/ip_forward
to 1.However, you can not make a container image with system-level parameter modified. The only way to set it is on runtime, which is a privileged operation in container.
If you don't grant privilege to the proxy container, it can not function at all. But if you do, a long-running privileged container in your cluster is not safe.
Klipper-lb tries to modify this parameter at runtime. If this operation is unsuccessful, it chooses to ignore it.
All of this means Klipper-lb may not work without privilege in some cluster.
To solve this problem, A privileged init container is introduced in the new mode of OpenELB, which is only used for modifying this system-level parameter.
In this way, we ensure security and make OpenELB take effect in more cases.
Smaller Resource Consumption
If you created a LoadBalancer Service with multiple ports, the Klipper-lb will create proxy Pod with the same number of containers.
This means, in Klipper-lb:
A LoadBalancer Service with 5 ports => A proxy Pod with 5 containers.
Even though Klipper-lb uses the alpine image to build proxy image, there're still some extra resource consumption for multiple containers running just for proxy usage.
In the new mode of OpenELB, it uses only 1 proxy container to do proxy.
This means, in OpenELB:
A LoadBalancer Service with 5 ports => A proxy Pod with 1 container.
This slightly reduce some resource consumption.
You may be wondering: why not make all LoadBalancer Service to use just only one proxy container on a Node? The answer is stability.
The ports of a LoadBalancer Service are usually definite and be barely modified over time, which means the proxy Pod Deployment is also barely modified over time.
But the LoadBalancer Service could be created at any time in cluster, making the proxy Pod Deployment frequent reconfiguration, which will eventually lead to the frequent rescheduling of the proxy Pod.
This is not what we expect.
TODO
LB Service
is confusing).The text was updated successfully, but these errors were encountered: