Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: a more simple way to expose LoadBalancer Service by OpenELB #214

Open
2 of 5 tasks
KONY128 opened this issue Sep 6, 2021 · 1 comment
Open
2 of 5 tasks

Comments

@KONY128
Copy link
Contributor

KONY128 commented Sep 6, 2021

Proposal: a more simple way to expose LoadBalancer Service by OpenELB

Idea

This mode is inspired by K3s Klipper-LB.

A new mode Node Proxy is developed for OpenELB, which has the following features:

  • The LoadBalancer Service can be accessed directly through the IP address and service port of a Kubernetes cluster node, rather than using the NodePort method.
  • OpenELB automatically selects a Kubernetes cluster node whose port corresponding to this service is not occupied.
  • Service traffic is forwarded to the ClusterIP of LoadBalancer Service via iptables NAT rules, and then further forwarded by kube-proxy to the back-end pod.

The following diagram shows the network topology of a K8s cluster with OpenELB.

Network-Topology

The IP addresses and cluster structure shown in the previous figure are only examples. The topology is described as follows:

  • Users can access nodes outside the cluster through external-ip, or access nodes through internal-ip in intranet.
  • OpenELB listens for all events of LoadBalancer Service hosted by OpenELB.
  • If the user chooses to expose the Service in this mode, OpenELB creates a DaemonSet or Deployment of proxy Pod for each LoadBalancer Service.
  • Each DaemonSet or Deployment deploys proxy Pods named svc-proxy-[service-name]-[namespace] .
  • Proxy Pods forward Service traffics to the ClusterIP of the LoadBalancer Service. So users can access Service outside the cluster.

Here are some useful notes:

  • Master Nodes will also be deployed with a proxy Pod if possible.
  • Nodes with external-ip are always on a higher deployment priority.
    • For Deployment, at the scheduling process, Nodes with external-ip will always be considered first.
    • For DaemonSet, if there's any proxy Pod deployed on Nodes with external-ip, the external-ip of the LoadBalancer Service will be set to them. If not, the internal-ip of these Nodes will be set to it.
  • Proxy Pod occupies the specific ports of the Node.
  • Whether Proxy Pod is deployed by Deployment or DaemonSet is user-specified. Hot ports like 80 or 8080 suggest using Deployment to avoid collision.
  • If a Proxy Pod is unavailable, the corresponding LoadBalancer Service will update its status.
  • Supported network protocols including TCP and UDP.

The relationships of these elements are described here:

Element-Relationship

  • OpenELB creates a Deployment or DaemonSet for hosted LoadBalancer Service. It is up to the user to specify which one to create.
  • The Deployment And DaemonSet creates proxy Pods for LoadBalancer Service.
  • A proxy Pod contains one proxy container with multiple ports, which corresponds to ports configured in the LoadBalancer Service.

The network path of this system looks like this:

Network-Structure

The IP addresses shown in the previous figure are only examples. The procedure is described as follows:

  1. Users use [node-external-ip]:[service-port] or [node-internal-ip]:[service-port] (Users need to be in an intranet environment) to send udp/tcp packets, which later send into the proxy Pod.
  2. The proxy Pod forwards udp/tcp packets by iptables rules to the ClusterIP of the LoadBalancer Service, which of course does not exist in the network context of the proxy Pod. Then, packets are sent to host.
  3. On host, the kube-proxy manages udp/tcp packets by iptables or ipvs. These packets are then sent to target Pod IP which managed by CNI.

How 2 Use

Deploy One Proxy Pod By Deployment

Here's a LoadBalancer example file to deploy one proxy Pod by Deployment:

apiVersion: v1
kind: Service
metadata:
	...
  annotations:
    # Use this annotation to make this LoadBalancer Service managed by OpenELB.
    lb.kubesphere.io/v1alpha1: porter
    # `deployment` means specifying a single one proxy Pod Deployment.
    node-proxy.porter.kubesphere.io/type: deployment
spec:
  ...
  type: LoadBalancer

This creates a Deployment on namespace porter-system, which will deploy a proxy Pod for this LoadBalancer Service.

Once the proxy Pod is successfully runnig, there's some new annotation will occur on this Service:

apiVersion: v1
kind: Service
metadata:
...
  annotations:
    lb.kubesphere.io/v1alpha1: porter
    node-proxy.porter.kubesphere.io/type: deployment
    # If any proxy Pod deployed on a Node with external-ip,
    # it will be displayed here.
    node-proxy.porter.kubesphere.io/external-ip: 192.122.1.2
    # Same for Internal-IP.
    # Note: a node can have external-ip and internal-ip at the same time.
    node-proxy.porter.kubesphere.io/internal-ip: 10.0.0.1
spec:
  ...
  type: LoadBalancer
Deploy Proxy Pods By DaemonSet

Here's a LoadBalancer example file to deploy proxy Pods on all nodes by DaemonSet:

apiVersion: v1
kind: Service
metadata:
...
  annotations:
    # Use this annotation to make this LoadBalancer Service manage by Openelb.
    lb.kubesphere.io/v1alpha1: porter
    # `daemonset` means deploying proxy Pods on all Nodes by DaemonSet.
    node-proxy.porter.kubesphere.io/type: daemonset
spec:
  ...
  type: LoadBalancer

This creates a DaemonSet on namespace porter-system, which will deploy proxy Pods on all nodes for this LoadBalancer Service.

Once the proxy Pods are successfully running, there's some new annotation will occur on this Service:

apiVersion: v1
kind: Service
metadata:
...
  annotations:
    lb.kubesphere.io/v1alpha1: porter
    node-proxy.porter.kubesphere.io/type: daemonset
    # If any proxy Pod deployed on a Node with external-ip,
    # it will be displayed here.
    node-proxy.porter.kubesphere.io/external-ip: 192.122.1.2,192.107.29.44
    # Same for Internal-IP.
    # Note: a node can have external-ip and internal-ip at the same time.
    node-proxy.porter.kubesphere.io/internal-ip: 10.0.0.1,10.0.0.2
spec:
  ...
  type: LoadBalancer
Specify the proxy and forwarding image by ConfigMap

By default, the proxy and forwarding image will be set as same as the OpenELB releasing version which can be seen in Makefile.

If you want to use a customize image or to change the image version, you can create a ConfigMap in the OpenELB control namespace, which is default porter-system.

Here's a ConfigMap sample:

apiVersion: v1
kind: ConfigMap
metadata:
  name: node-proxy-config
  # The namespace should be as same as the OpenELB operator
  namespace: porter-system
data:
  # Specify the custom proxy and forwarding image here
  forward-image: kony168/openelb-forward:v0.4.1005
  proxy-image: kony168/openelb-proxy:v0.4.1005
Exclude Nodes For Proxy Pod Deployment

By default, the proxy Pod deployment schedule process takes all available Nodes in consideration, including Master Nodes.

If you don't want a Node to be deployed with any proxy Pod, label it with node-proxy.porter.kubesphere.io/exclude-node.

Release Resources

Deleting the corresponding LoadBalancer Service releases resources created in this mode (Deployments, DaemonSets for proxy Pods).

Discussions

Why not just make modification on the iptables rules of Nodes?

First, this method disturbes the iptables rules of the host.

Second, the Kube-Proxy on every Node will flush iptables rule or ipvs rules at a fixed interval of time.

Third, making a Pod to make modification on host means it's privileged, which is dangerous.

A Pod with iptables rule modification consumes a small amount of cpu and memory resources, but it's worth it.

Why not just expose a Service by NodePort?

First, the ports available in NodePort mode are limited in scope (default is 30000 - 32767). Some lower ports, such as port 80, are not officially recommended for use.

Second, hot port preemption can not be solved in NodePort. You can not expose multiple Services on a single port.

The LB Service mode is created to solve these problem.

Why not just set exposed Node ips on external-ip of LoadBalancer Service?

If you set the external-ip of the LoadBalancer Service, the Kube-Proxy will modify iptables or ipvs rules on all Nodes. This is very dangerous because it disturbes the cluster network environment.

After much deliberation, I had to settle for a painful compromise.

The external-ip of the LoadBalancer Service will always stay <Pending> in this mode, and the information of exposure is displayed in annotation.

Differences from klipper-lb in K3s

No privilege to takes effect in more cases

Forwarding packets in unix-like systems is disabled by default for security. To open it, you needs to set /proc/sys/net/ipv4/ip_forward to 1.

However, you can not make a container image with system-level parameter modified. The only way to set it is on runtime, which is a privileged operation in container.

If you don't grant privilege to the proxy container, it can not function at all. But if you do, a long-running privileged container in your cluster is not safe.

Klipper-lb tries to modify this parameter at runtime. If this operation is unsuccessful, it chooses to ignore it.

# Notice: This is a piece of code from Klipper-lb proxy container

# If modification failed, proxy container will ignore it.
# While this modification only takes effects in privileged container.
echo 1 > /proc/sys/net/ipv4/ip_forward || true
# After our experiments, the value of the parameter in the container 
# defaults to zero under the latest alpine image, even if this 
# parameter is turned on in the host
if [ `cat /proc/sys/net/ipv4/ip_forward` != 1 ]; then
    exit 1
fi

All of this means Klipper-lb may not work without privilege in some cluster.

To solve this problem, A privileged init container is introduced in the new mode of OpenELB, which is only used for modifying this system-level parameter.

In this way, we ensure security and make OpenELB take effect in more cases.

Smaller Resource Consumption

If you created a LoadBalancer Service with multiple ports, the Klipper-lb will create proxy Pod with the same number of containers.

This means, in Klipper-lb:

A LoadBalancer Service with 5 ports => A proxy Pod with 5 containers.

Even though Klipper-lb uses the alpine image to build proxy image, there're still some extra resource consumption for multiple containers running just for proxy usage.

In the new mode of OpenELB, it uses only 1 proxy container to do proxy.

This means, in OpenELB:

A LoadBalancer Service with 5 ports => A proxy Pod with 1 container.

This slightly reduce some resource consumption.

You may be wondering: why not make all LoadBalancer Service to use just only one proxy container on a Node? The answer is stability.

The ports of a LoadBalancer Service are usually definite and be barely modified over time, which means the proxy Pod Deployment is also barely modified over time.

But the LoadBalancer Service could be created at any time in cluster, making the proxy Pod Deployment frequent reconfiguration, which will eventually lead to the frequent rescheduling of the proxy Pod.

This is not what we expect.

TODO

  • Writing Chinese document of this mode.
  • Writing test code and generate coverage reports.
  • The format name of this node (The name of LB Service is confusing).
  • Discuss The release version number of this mode.
  • Specify the proxy and forwarding image by ConfigMap
@stevefan1999-personal
Copy link

stevefan1999-personal commented Oct 6, 2022

Hello, I'm a homelab user who is frustrated over the Klipper LB and wanted to switch to k0s but I can't find a replacement for the load balancer, and it seems like this feature would greatly help. I have a couple of ideas and I will update them once I came up, so this might get edited multiple times. Here's the features I shortlisted so far:

1. Fallback proxy mode

Description

Sometimes it is really not possible to modify iptables (for example running OpenShift) then we should fallback to using a two-way socket pipe as a workaround. This would greatly reduce performance but it will at least work in some situation. Take that as a better version of inlets I would say.

Implementation

Write a simple proxy that pipes TCP/UDP to/from external/pod world. Once the Iptables approach didn't work we will use that instead. Or the user could force the usage as such,

2. Health/Live checking

Description

In a normal cloud-based load balancer there will often be load balancer liveness checking on whether or not the load balancer is accepting traffic or not. In especially to our homelab environment where one node might be dead at some point in time (well at least I'm pretty extreme since I run K8S cluster on the Internet with WireGuard as a network backbone), we will need to shift load balancer to other hosts as soon as possible. Specifically in my case, without health checking we cannot be sure about the load balancer address so external-dns cannot change the DNS record as soon as possible for my API services. I also need this because sometime UPNP holepunching would die off due to timeout (yes some UPNP support router just keep the connection as long as to some point in time and got undefined behavior after that) and this would greatly help making sure the load balancer is doing well.

Implementation

As such, we should also implement a health checker by sending TCP/UDP* packet and place the node affinity to selecting any one of the nodes other than the load balancer node. We can further detect whether our target port has application-level protocol such as HTTP/TLS/MySQL so that we can also do application-specific checking for a more accurate result.

*: UDP is connectionless and hence health check may not work

3. IP status update

Description & Implementation

There is a field in ServiceSpec called spec.loadbalancerIp and I think it speaks for itself. We can also update the IPs using health check, so that it reflects the most updated hosts that can reach to the service externally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants