diff --git a/contributors/design-proposals/loadbalancer-proposal.md b/contributors/design-proposals/loadbalancer-proposal.md new file mode 100644 index 00000000000..e656c849810 --- /dev/null +++ b/contributors/design-proposals/loadbalancer-proposal.md @@ -0,0 +1,396 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Ingress, LoadBalancer and LoadBalancerClaim proposal + +**Author**: @mqliang, @ddysher + +Nov 2016 + +## Overview + +The proposal aims to define a set of APIs to abstract loadbalancer resource, +similar to how PV/PVC works. In the end, we'll + +* introduce a new API object `LoadBalancer` to represent a loadbalancer (think of PV) +* introduce a new API object `LoadBalancerClaim` to claim a new or existing + LoadBalancer (think of PVC) +* change existing Ingress API to include a `LBSource` field, which either directly + uses a LoadBalancer, or via a LoadBalancerClaim (think of Pod using PV or PVC) + +We will also introduce network resource such as bandwidth and iops. LoadBalancerClaim +uses these attributes to claim a LoadBalacer. Though similar to PV/PVC, it is +important to know that LoadBalancer and LoadBalancerClaim is not 1-to-1 binding; +rather, it is a 1-to-many relationship. That is, a LoadBalancer can be used to +serve multiple LoadBalancerClaim. + +## Background + +### Current Ingress behavior + +Ingress can be used to expose a service in the kubernetes cluster: + +* cluster admin deploys an ingress-controller Pod beforehand +* user creates Ingress resource +* the ingress-controller Pod list&watch **All** Ingress Resources in the cluster, + when it sees a new Ingress resource: + * on cloud provider, it calls the cloud provider to sync the ingress L7 + loadbalancing rules + * on bare-metal, it syncs nginx (or haproxy, etc) config then reload +* user out of cluster can then access service in the cluster: + * on bare-metal, accessing the node's ip on which ingress-controller Pod is + running, ingress-controller Pod will forward request into cluster based on + rules defined in Ingress resource + * on cloud-provider, accessing the ip provided by cloud provider loadbalancer, + cloud provider will forward request into cluster based on rules defined in + Ingress Resource + +### Limitations of current Ingress implementation + +* How many ingress-controller Pods should run in a cluster? Should all + ingress-controller Pod list&watch all Ingress resource? There is no way to + bind or schedule Ingress resource to a ingress-controller Pod, which result in: + * insufficient or excessive use of neworking resource + * reload storm when updating Ingress resource, or due to Pod changes +* Ingress resource is actually internet l7 loadbalancing rules, intranet l7 + loadbalancing rules has not been supported yet. Eventually, We need a general + mechanism for both the Ingress and intranet L7 lb rules "consume" a + loadbalancer +* On bare-metal, it does not provide High Availability because client needs + to know the IP addresss of the node where ingress-controller Pod is running. + In case of a failure, the ingress-controller Pod will be moved to a different + node. + +## Goal and NonGoal + +### Goal + +* Define `LoadBalancer` API +* Define `LoadBalancerClaim` API +* Define network attributes of LoadBalancer +* Define loadbalancer provider, what it is and how it works +* Define loadbalancer scheduling + +### NonGoal + +* LoadBalancer HA on bare-metal +* LoadBalancerClass: different types (internet or intranet), different qos level, etc +* LoadBalancer scheduling and over-commitment + +## Design + +### LoadBalancer + +`LoadBalancer` is a first-class API in kubernetes. It represents network resource +for internet and intranet loadbalance. LoadBalancer will eventually be 'used' or +'consumed' via Ingress resources, which basically defines forwarding rules to a set +of Pods. Different LoadBalancer has different network attributes (bandwidth, iops, +etc). + +### LoadBalancerClaim + +`LoadBalancerClaim` is also a first-class API in kubernetes, and as its name +suggests, it is used to claim a LoadBalancer. LoadBalancerClaim claims a LoadBalancer +based on network attributes mentioned above. If no LoadBalancer satisfies a +claim, a new one can be created on the fly, just like how PV is dynamically +provisioned. + +For more background see https://github.com/kubernetes/kubernetes/issues/30151 + +### LoadBalancer Provider Interface + +LoadBalancer Provider Interface is an interface to Create/Update/Delete LoadBalancer. +There can be multiple LoadBalancer provider implementations, such as: +* AWS loadbalancer provider +* GCE loadbalancer provider +* bare-metal nginx loadbalancer provider +* bare-metal highly-available nginx loadbalancer provider +* bare-metal haproxy loadbalancer provider +* bare-metal highly-available haproxy loadbalancer provider + +### Loadbalancer-controller + +loadbalancer-controller is responsible for: + +* list&watch Ingress resources and call Loadbalancer provider to update + corresponding loadbalancer's loadbalancing rules +* pick a best-matching LoadBalancer from existing LoadBalancer pool for + LoadBalancerClaim based on network attribute request of the LoadBalancerClaim +* call loadbalancer provider to dynamically provision a LoadBalancer for + LoadBalancercLaim when it can not find a matching one among existing LoadBalancer +* recycle or deprovision a LoadBalancer when there is no consumers + +### Loadbalancer scheduling + +As mentioned before, LoadBalancer and LoadBalancerClaim binding is not exclusive, +which means multiple LoadBalancerClaim can be bound to one LoadBalancer. For +example, if we have a LoadBalancer with 3G bandwidth, we can bind 6 LoadBalancerClaim +each request 500m bandwidth on it. In such case, we need to a 'scheduling' logic. + +Further, we can eventually introduce the 'request/limit model' for network resource +to acheieve functionalities already implemented in compute resource, for example, +qos and overcommit. + +##### Manually assign LoadBalancerClaim to a LodBalancer +User can also manually assign LoadBalancerClaim to a LodBalancer, instead of letting +loadbalancer-controller schedule for him. In such a case, resource request of all +loadbalancerclaims may excess the loadbalancer's capacity. We validate request and +capacity when loadbalancer-controller updating the loadbalancing rules: sort all +Ingress "consume" a same LoadBalancers by creation time, if the sum request excess +loadbalancer's capacity, avoid updating rules for the last few Ingress and send a +event. Just like how we validate request at kubelet's side for Pods. + + +## API + +### Network Resource + +```go +const ( + ResourceBandWidth ResourceName = "network-bandwidth" + ResourceIOPS ResourceName = "network-iops" +) +``` + +We can introduce more network resource in the future. + +### Loadbalancer API + +```go +type LoadBalancer struct { + unversioned.TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Spec defines a loadbalancer owned by the cluster + Spec LoadBalancerSpec `json:"spec,omitempty"` + + // Status represents the current information about loadbalancer. + Status LoadBalancerStatus `json:"status,omitempty"` +} + +type LoadBalancerSpec struct { + // Resources represents the actual resources of the loadbalancer + Capacity ResourceList `json:"capacity"` + // Source represents the location and type of a loadbalancer to use. + LoadBalancerSource `json:",inline"` +} + +type LoadBalancerSource struct { + GCELoadBalancer *GCELoadBalancerSource `json:"gceLoadBalancer,omitempty"` + AWSLoadBalancer *AWSLoadBalancerSource `json:"awsLoadBalancer,omitempty"` + BareMetalLoadBalancer *BareMetalLoadBalancerSource `json:"bareMetalLoadBalancer,omitempty"` + /* + more loadbalancer source + */ +} + +type GCELoadBalancerSource struct { + // Unique name of the LoadBalancer resource. Used to identify the LoadBalancer in GCE + LBName string `json:"lbName"` +} + +type AWSLoadBalancerSource struct { + // Unique name of the LoadBalancer resource. Used to identify the LoadBalancer in AWS + LBName string `json:"lbName"` +} + +type BareMetalLoadBalancerSource struct { + Type BareMetalLoadBalancerType `json:"type"` + ServiceName string `json:"serviceName"` //BareMetalLoadBalancer is actually a nginx or haproxy app deploymented in "kube-system" namespace +} + +type BareMetalLoadBalancerType string + +const ( + NginxLoadBalancer BareMetalLoadBalancerType = "Nginx" + HaproxyLoadBalancer BareMetalLoadBalancerType = "Haproxy" +) + +// LoadBalancerStatus represents the status of a load-balancer +type LoadBalancerStatus struct { + // Phase indicates if a loadbalancer is pending, running or failed + Phase LoadBalancerPhase `json:"phase,omitempty"` + // A human-readable message indicating details about why the loadbalancer is in this state. + Message string `json:"message,omitempty"` + // Ingress is a list containing ingress points for the load-balancer; + // traffic should be sent to these ingress points. + Ingress []LoadBalancerIngress `json:"ingress,omitempty"` +} + +type LoadBalancerPhase string + +const ( + // used for LoadBalancers that are not available + LoadBalancerPending LoadBalancerPhase = "Pending" + // used for LoadBalancers that are working well + LoadBalancerPending LoadBalancerPhase = "Running" + // used for LoadBalancers that failed to be correctly recycled or deleted after being released from a claim + LoadBalanceFailed LoadBalancerPhase = "Failed" +) + +// LoadBalancerIngress represents the status of a load-balancer ingress point: +// traffic should be sent to an ingress point. +type LoadBalancerIngress struct { + // IP is set for load-balancer ingress points that are IP based + // (typically GCE or OpenStack load-balancers) + IP string `json:"ip,omitempty"` + + // Hostname is set for load-balancer ingress points that are DNS based + // (typically AWS load-balancers) + Hostname string `json:"hostname,omitempty"` +} +``` + +### LoadbalancerClaim API + +```go +type LoadbalancerClaim struct { + unversioned.TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Spec defines a loadbalancer owned by the cluster + Spec LoadBalancerClaimSpec `json:"spec,omitempty"` + + // Status represents the current information about loadbalancer. + Status LoadBalancerClaimStatus `json:"status,omitempty"` +} + +type LoadBalancerClaimSpec struct { + // A label query over loadbalancer to consider for binding. This selector is + // ignored when LoadBalancerName is set + Selector *unversioned.LabelSelector `json:"selector,omitempty"` + // Resources represents the minimum resources required + Resources ResourceRequirements `json:"resources,omitempty"` + // LoadBalancerName is the binding reference to the LoadBalancerName backing this + // claim. When set to non-empty value Selector is not evaluated + LoadBalancerName string `json:"loadbalancerName,omitempty"` +} + +type LoadBalancerClaimStatus struct { +} +``` + +### Ingress API Changes + +Add a new knob "LoadBalancer" in IngressSpec, so that Ingress can "use" or +"consume" a LoadBalancer, either directly or through LoadBalancerClaim. + +```go +type IngressSpec struct { + /* + ... + */ + + LoadBalancer LBSource `json:"loadBalancer"` +} + +type LBSource struct { + GCELoadBalancer *GCELoadBalancerSource `json:"gceLoadBalancer,omitempty"` + AWSLoadBalancer *AWSLoadBalancerSource `json:"awsLoadBalancer,omitempty"` + BareMetalLoadBalancer *BareMetalLoadBalancerSource `json:"bareMetalLoadBalancer,omitempty"` + LoadBalancerClaim *LoadBalancerClaimSource `json:"loadBalancerClaim,omitempty"` +} + +type LoadBalancerClaimSource struct { + // ClaimName is the name of a LoadBalancerClaim in the same namespace as the ingress using this lb + ClaimName string `json:"claimName"` +} +``` + +## Implementation Detail + +### loadbalancer + +The current loadbalancer (ingress-controller) implementation has some limitations +when we have multiple loadbalancers: + +* we need to deploy multiple ingress-controller pod even on cloud provider, it + results in some excessive use of resource. Actually it fine that a cluster + just has one process which list&watch all ingresses, and call cloud provider + to update loadbalancing rules. + +Thus, we propose to redesign the current ingress-controller as follows: + +Add a loadbalancer-controller in conroller-manager component, which list&watch +all ingresses resource and call loadbalancer provider to update the +loadbalancing rules. + +* On cloud provider + + No extra ingress-controler need to be deployed + +* On bare-metal + * put all loadbalancing rules in configmap + * ingress-controller not need list&watch any Ingress anymore. Just notify + the nginx/haproxy process to reload when configmap was updated. + * loadbalancer-controller in controller-manager component will list&watch + all Ingress and update corresponding nginx/haproxy's configmap + +#### loadbalancer controller + +Add a loadbalancer-controller in conroller-manager component. Which works as +follows: + +* bind loadbalancerclaim with loadbalancer +* dynamically provision loadbalancer on demand +* recycle or deprovision a loadbalancer when no loadbalancerclaim bind with + it for a long time. +* list&watch all ingress and call loadbalancer provider to update l7 + loadbalancing rules +* we put loadblancer provider logic in loadbalancer-controlle too. On cloud + provider, it just delegate all loadbalancer provider logic to cloud provider; + on bare-metal, it: + * create a loadbalancer by deploying a nginx/haproxy pod in "kube-system" + namespace + * update a loadbalancer by updating the configmap of nginx/haproxy pod + * delete a loadbalancer by deleting the nginx/haproxy pod and all its relating + resource(such as configmap, etc). + +## Implementation plan + +### First step: make it workable + +* implement the scheduling, provisioning and recycling logic first. +* ingress-controller just works as currently, but it just list&watch ingresses + assigned to it, instead of all ingresses + +### Second step: loadbalancer provider + +* Add loadbalancer provider in loadbalancer-controller +* refactor the current ingress-controller implementation as I descrive in the + "Implementation" section and rename "ingress-controller" as + "nginx-loadbalancer" + +### Long term + +* loadbalancer scheduling over-commitment + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/cluster-capacity.md?pixel)]() +