-
Notifications
You must be signed in to change notification settings - Fork 5.3k
add loadbalancer and loadbalancerclaim proposal #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,396 @@ | ||
| <!-- BEGIN MUNGE: UNVERSIONED_WARNING --> | ||
|
|
||
| <!-- BEGIN STRIP_FOR_RELEASE --> | ||
|
|
||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" | ||
| width="25" height="25"> | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" | ||
| width="25" height="25"> | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" | ||
| width="25" height="25"> | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" | ||
| width="25" height="25"> | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" | ||
| width="25" height="25"> | ||
|
|
||
| <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2> | ||
|
|
||
| If you are using a released version of Kubernetes, you should | ||
| refer to the docs that go with that version. | ||
|
|
||
| Documentation for other releases can be found at | ||
| [releases.k8s.io](http://releases.k8s.io). | ||
| </strong> | ||
| -- | ||
|
|
||
| <!-- END STRIP_FOR_RELEASE --> | ||
|
|
||
| <!-- END MUNGE: UNVERSIONED_WARNING --> | ||
|
|
||
| # Ingress, LoadBalancer and LoadBalancerClaim proposal | ||
|
|
||
| **Author**: @mqliang, @ddysher | ||
|
|
||
| Nov 2016 | ||
|
|
||
| ## Overview | ||
|
|
||
| The proposal aims to define a set of APIs to abstract loadbalancer resource, | ||
| similar to how PV/PVC works. In the end, we'll | ||
|
|
||
| * introduce a new API object `LoadBalancer` to represent a loadbalancer (think of PV) | ||
| * introduce a new API object `LoadBalancerClaim` to claim a new or existing | ||
| LoadBalancer (think of PVC) | ||
| * change existing Ingress API to include a `LBSource` field, which either directly | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| uses a LoadBalancer, or via a LoadBalancerClaim (think of Pod using PV or PVC) | ||
|
|
||
| We will also introduce network resource such as bandwidth and iops. LoadBalancerClaim | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do the IO in iops correspond to? Seems more natural to have requests per second? |
||
| uses these attributes to claim a LoadBalacer. Though similar to PV/PVC, it is | ||
| important to know that LoadBalancer and LoadBalancerClaim is not 1-to-1 binding; | ||
| rather, it is a 1-to-many relationship. That is, a LoadBalancer can be used to | ||
| serve multiple LoadBalancerClaim. | ||
|
|
||
| ## Background | ||
|
|
||
| ### Current Ingress behavior | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this also be used for services of type LoadBalancer? The lack of explicit tracking of cloud-provider load balancers has been a problem (e.g., kubernetes/kubernetes#32157, kubernetes/kubernetes#15203) |
||
|
|
||
| Ingress can be used to expose a service in the kubernetes cluster: | ||
|
|
||
| * cluster admin deploys an ingress-controller Pod beforehand | ||
| * user creates Ingress resource | ||
| * the ingress-controller Pod list&watch **All** Ingress Resources in the cluster, | ||
| when it sees a new Ingress resource: | ||
| * on cloud provider, it calls the cloud provider to sync the ingress L7 | ||
| loadbalancing rules | ||
| * on bare-metal, it syncs nginx (or haproxy, etc) config then reload | ||
| * user out of cluster can then access service in the cluster: | ||
| * on bare-metal, accessing the node's ip on which ingress-controller Pod is | ||
| running, ingress-controller Pod will forward request into cluster based on | ||
| rules defined in Ingress resource | ||
| * on cloud-provider, accessing the ip provided by cloud provider loadbalancer, | ||
| cloud provider will forward request into cluster based on rules defined in | ||
| Ingress Resource | ||
|
|
||
| ### Limitations of current Ingress implementation | ||
|
|
||
| * How many ingress-controller Pods should run in a cluster? Should all | ||
| ingress-controller Pod list&watch all Ingress resource? There is no way to | ||
| bind or schedule Ingress resource to a ingress-controller Pod, which result in: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One Ingress controller handles the setup for all services of a certain class in the cluster. I understand why you would run multiple kinds of Ingress controllers (one for handling each type), but why would we run more than that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It‘s mainly because a single L7 loadbalancer may have a upper limit about some resource(e.g bandwidth). For example, a single L7 loadbalancer on Aliyun have a upper limit of 0.5G bandwidth for incoming requests. It's impossible let a single ingress controller to serve all ingress. So, instead of let a single "Aliyun" class ingress-controller to serve all "Aliyun" class ingress. I want deploy multiple "Aliyun" class ingress-controller, and have some "scheduling" or "binding" logic to assign ingress to ingress-controller. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like an implementation detail as to how many pods are used to implement the controller logic, it may not be 1-1 as you are suggesting here. It's not clear to me why we should expose this to the user of k8s. We should also be careful to distinguish between "ingress" spec which describes a L7 exposed service and the actual LBs that implement the service. Does each Aliyun LB Ingress have a different IP? Or do they share the same IP to the external user? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, each Aliyun L7 LB have a different IP. See documents here https://www.alibabacloud.com/help/doc-detail/32459.htm?spm=a3c0i.o32460en.b99.6.387f4cf8yNv1ao I understand that aws/gce LB is very powerful, basically you can send as many traffic to LB as possible and LB itself will scale up/down accordingly. There is no maximum bandwidth defined for a single LB on aws/gce. However, on other cloud platform and bare-metal , L7 LB itself is not horizontally scalable |
||
| * insufficient or excessive use of neworking resource | ||
| * reload storm when updating Ingress resource, or due to Pod changes | ||
| * Ingress resource is actually internet l7 loadbalancing rules, intranet l7 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to not be relevant to the proposal? (it's a separate feature I would assume) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it would be a separate feature |
||
| loadbalancing rules has not been supported yet. Eventually, We need a general | ||
| mechanism for both the Ingress and intranet L7 lb rules "consume" a | ||
| loadbalancer | ||
| * On bare-metal, it does not provide High Availability because client needs | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would the ingress controller IP need to be known? Its role is to watch the API server and do the control flow work to create the load balancer, rather than be the load balancer itself? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1. Also do you need the controller pod IP or the node IP? If its the controller pod IP, you can use a kube service instead, in which case rescheduling to a different node should not matter. |
||
| to know the IP addresss of the node where ingress-controller Pod is running. | ||
| In case of a failure, the ingress-controller Pod will be moved to a different | ||
| node. | ||
|
|
||
| ## Goal and NonGoal | ||
|
|
||
| ### Goal | ||
|
|
||
| * Define `LoadBalancer` API | ||
| * Define `LoadBalancerClaim` API | ||
| * Define network attributes of LoadBalancer | ||
| * Define loadbalancer provider, what it is and how it works | ||
| * Define loadbalancer scheduling | ||
|
|
||
| ### NonGoal | ||
|
|
||
| * LoadBalancer HA on bare-metal | ||
| * LoadBalancerClass: different types (internet or intranet), different qos level, etc | ||
| * LoadBalancer scheduling and over-commitment | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confused -- this seems to appear both in the goal and non-goal sections. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies for the misleading. Simple scheduling logic(assign ingress to ingress-controller) is the goal. And scheduling over-commitment is currently not the goal. |
||
|
|
||
| ## Design | ||
|
|
||
| ### LoadBalancer | ||
|
|
||
| `LoadBalancer` is a first-class API in kubernetes. It represents network resource | ||
| for internet and intranet loadbalance. LoadBalancer will eventually be 'used' or | ||
| 'consumed' via Ingress resources, which basically defines forwarding rules to a set | ||
| of Pods. Different LoadBalancer has different network attributes (bandwidth, iops, | ||
| etc). | ||
|
|
||
| ### LoadBalancerClaim | ||
|
|
||
| `LoadBalancerClaim` is also a first-class API in kubernetes, and as its name | ||
| suggests, it is used to claim a LoadBalancer. LoadBalancerClaim claims a LoadBalancer | ||
| based on network attributes mentioned above. If no LoadBalancer satisfies a | ||
| claim, a new one can be created on the fly, just like how PV is dynamically | ||
| provisioned. | ||
|
|
||
| For more background see https://github.com/kubernetes/kubernetes/issues/30151 | ||
|
|
||
| ### LoadBalancer Provider Interface | ||
|
|
||
| LoadBalancer Provider Interface is an interface to Create/Update/Delete LoadBalancer. | ||
| There can be multiple LoadBalancer provider implementations, such as: | ||
| * AWS loadbalancer provider | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give some more details what the provider interface will look like? What functions will it support? |
||
| * GCE loadbalancer provider | ||
| * bare-metal nginx loadbalancer provider | ||
| * bare-metal highly-available nginx loadbalancer provider | ||
| * bare-metal haproxy loadbalancer provider | ||
| * bare-metal highly-available haproxy loadbalancer provider | ||
|
|
||
| ### Loadbalancer-controller | ||
|
|
||
| loadbalancer-controller is responsible for: | ||
|
|
||
| * list&watch Ingress resources and call Loadbalancer provider to update | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the call to the LB provider an in process call? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point |
||
| corresponding loadbalancer's loadbalancing rules | ||
| * pick a best-matching LoadBalancer from existing LoadBalancer pool for | ||
| LoadBalancerClaim based on network attribute request of the LoadBalancerClaim | ||
| * call loadbalancer provider to dynamically provision a LoadBalancer for | ||
| LoadBalancercLaim when it can not find a matching one among existing LoadBalancer | ||
| * recycle or deprovision a LoadBalancer when there is no consumers | ||
|
|
||
| ### Loadbalancer scheduling | ||
|
|
||
| As mentioned before, LoadBalancer and LoadBalancerClaim binding is not exclusive, | ||
| which means multiple LoadBalancerClaim can be bound to one LoadBalancer. For | ||
| example, if we have a LoadBalancer with 3G bandwidth, we can bind 6 LoadBalancerClaim | ||
| each request 500m bandwidth on it. In such case, we need to a 'scheduling' logic. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -to |
||
|
|
||
| Further, we can eventually introduce the 'request/limit model' for network resource | ||
| to acheieve functionalities already implemented in compute resource, for example, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *achieve |
||
| qos and overcommit. | ||
|
|
||
| ##### Manually assign LoadBalancerClaim to a LodBalancer | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: Lod->Load |
||
| User can also manually assign LoadBalancerClaim to a LodBalancer, instead of letting | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: Lod->Load |
||
| loadbalancer-controller schedule for him. In such a case, resource request of all | ||
| loadbalancerclaims may excess the loadbalancer's capacity. We validate request and | ||
| capacity when loadbalancer-controller updating the loadbalancing rules: sort all | ||
| Ingress "consume" a same LoadBalancers by creation time, if the sum request excess | ||
| loadbalancer's capacity, avoid updating rules for the last few Ingress and send a | ||
| event. Just like how we validate request at kubelet's side for Pods. | ||
|
|
||
|
|
||
| ## API | ||
|
|
||
| ### Network Resource | ||
|
|
||
| ```go | ||
| const ( | ||
| ResourceBandWidth ResourceName = "network-bandwidth" | ||
| ResourceIOPS ResourceName = "network-iops" | ||
| ) | ||
| ``` | ||
|
|
||
| We can introduce more network resource in the future. | ||
|
|
||
| ### Loadbalancer API | ||
|
|
||
| ```go | ||
| type LoadBalancer struct { | ||
| unversioned.TypeMeta `json:",inline"` | ||
| ObjectMeta `json:"metadata,omitempty"` | ||
|
|
||
| // Spec defines a loadbalancer owned by the cluster | ||
| Spec LoadBalancerSpec `json:"spec,omitempty"` | ||
|
|
||
| // Status represents the current information about loadbalancer. | ||
| Status LoadBalancerStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| type LoadBalancerSpec struct { | ||
| // Resources represents the actual resources of the loadbalancer | ||
| Capacity ResourceList `json:"capacity"` | ||
| // Source represents the location and type of a loadbalancer to use. | ||
| LoadBalancerSource `json:",inline"` | ||
| } | ||
|
|
||
| type LoadBalancerSource struct { | ||
| GCELoadBalancer *GCELoadBalancerSource `json:"gceLoadBalancer,omitempty"` | ||
| AWSLoadBalancer *AWSLoadBalancerSource `json:"awsLoadBalancer,omitempty"` | ||
| BareMetalLoadBalancer *BareMetalLoadBalancerSource `json:"bareMetalLoadBalancer,omitempty"` | ||
| /* | ||
| more loadbalancer source | ||
| */ | ||
| } | ||
|
|
||
| type GCELoadBalancerSource struct { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to put the cloud provider specific pieces in this proposal? I would expect the API to not mention any cloud providers explicitly. |
||
| // Unique name of the LoadBalancer resource. Used to identify the LoadBalancer in GCE | ||
| LBName string `json:"lbName"` | ||
| } | ||
|
|
||
| type AWSLoadBalancerSource struct { | ||
| // Unique name of the LoadBalancer resource. Used to identify the LoadBalancer in AWS | ||
| LBName string `json:"lbName"` | ||
| } | ||
|
|
||
| type BareMetalLoadBalancerSource struct { | ||
| Type BareMetalLoadBalancerType `json:"type"` | ||
| ServiceName string `json:"serviceName"` //BareMetalLoadBalancer is actually a nginx or haproxy app deploymented in "kube-system" namespace | ||
| } | ||
|
|
||
| type BareMetalLoadBalancerType string | ||
|
|
||
| const ( | ||
| NginxLoadBalancer BareMetalLoadBalancerType = "Nginx" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comment above about cloud providers. The feature should be load balancer type agnostic. |
||
| HaproxyLoadBalancer BareMetalLoadBalancerType = "Haproxy" | ||
| ) | ||
|
|
||
| // LoadBalancerStatus represents the status of a load-balancer | ||
| type LoadBalancerStatus struct { | ||
| // Phase indicates if a loadbalancer is pending, running or failed | ||
| Phase LoadBalancerPhase `json:"phase,omitempty"` | ||
| // A human-readable message indicating details about why the loadbalancer is in this state. | ||
| Message string `json:"message,omitempty"` | ||
| // Ingress is a list containing ingress points for the load-balancer; | ||
| // traffic should be sent to these ingress points. | ||
| Ingress []LoadBalancerIngress `json:"ingress,omitempty"` | ||
| } | ||
|
|
||
| type LoadBalancerPhase string | ||
|
|
||
| const ( | ||
| // used for LoadBalancers that are not available | ||
| LoadBalancerPending LoadBalancerPhase = "Pending" | ||
| // used for LoadBalancers that are working well | ||
| LoadBalancerPending LoadBalancerPhase = "Running" | ||
| // used for LoadBalancers that failed to be correctly recycled or deleted after being released from a claim | ||
| LoadBalanceFailed LoadBalancerPhase = "Failed" | ||
| ) | ||
|
|
||
| // LoadBalancerIngress represents the status of a load-balancer ingress point: | ||
| // traffic should be sent to an ingress point. | ||
| type LoadBalancerIngress struct { | ||
| // IP is set for load-balancer ingress points that are IP based | ||
| // (typically GCE or OpenStack load-balancers) | ||
| IP string `json:"ip,omitempty"` | ||
|
|
||
| // Hostname is set for load-balancer ingress points that are DNS based | ||
| // (typically AWS load-balancers) | ||
| Hostname string `json:"hostname,omitempty"` | ||
| } | ||
| ``` | ||
|
|
||
| ### LoadbalancerClaim API | ||
|
|
||
| ```go | ||
| type LoadbalancerClaim struct { | ||
| unversioned.TypeMeta `json:",inline"` | ||
| ObjectMeta `json:"metadata,omitempty"` | ||
|
|
||
| // Spec defines a loadbalancer owned by the cluster | ||
| Spec LoadBalancerClaimSpec `json:"spec,omitempty"` | ||
|
|
||
| // Status represents the current information about loadbalancer. | ||
| Status LoadBalancerClaimStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| type LoadBalancerClaimSpec struct { | ||
| // A label query over loadbalancer to consider for binding. This selector is | ||
| // ignored when LoadBalancerName is set | ||
| Selector *unversioned.LabelSelector `json:"selector,omitempty"` | ||
| // Resources represents the minimum resources required | ||
| Resources ResourceRequirements `json:"resources,omitempty"` | ||
| // LoadBalancerName is the binding reference to the LoadBalancerName backing this | ||
| // claim. When set to non-empty value Selector is not evaluated | ||
| LoadBalancerName string `json:"loadbalancerName,omitempty"` | ||
| } | ||
|
|
||
| type LoadBalancerClaimStatus struct { | ||
| } | ||
| ``` | ||
|
|
||
| ### Ingress API Changes | ||
|
|
||
| Add a new knob "LoadBalancer" in IngressSpec, so that Ingress can "use" or | ||
| "consume" a LoadBalancer, either directly or through LoadBalancerClaim. | ||
|
|
||
| ```go | ||
| type IngressSpec struct { | ||
| /* | ||
| ... | ||
| */ | ||
|
|
||
| LoadBalancer LBSource `json:"loadBalancer"` | ||
| } | ||
|
|
||
| type LBSource struct { | ||
| GCELoadBalancer *GCELoadBalancerSource `json:"gceLoadBalancer,omitempty"` | ||
| AWSLoadBalancer *AWSLoadBalancerSource `json:"awsLoadBalancer,omitempty"` | ||
| BareMetalLoadBalancer *BareMetalLoadBalancerSource `json:"bareMetalLoadBalancer,omitempty"` | ||
| LoadBalancerClaim *LoadBalancerClaimSource `json:"loadBalancerClaim,omitempty"` | ||
| } | ||
|
|
||
| type LoadBalancerClaimSource struct { | ||
| // ClaimName is the name of a LoadBalancerClaim in the same namespace as the ingress using this lb | ||
| ClaimName string `json:"claimName"` | ||
| } | ||
| ``` | ||
|
|
||
| ## Implementation Detail | ||
|
|
||
| ### loadbalancer | ||
|
|
||
| The current loadbalancer (ingress-controller) implementation has some limitations | ||
| when we have multiple loadbalancers: | ||
|
|
||
| * we need to deploy multiple ingress-controller pod even on cloud provider, it | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to deploy multiple ingress controller pods? Do you mean multiple instances of the same controller (for HA) or different pods for each ingress class? If its the second then, admin can choose to deploy the ingress controllers only for ingress classes they want to enable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I want deploy multiple pod even for the same ingress class. It‘s mainly because a single L7 loadbalancer may have a upper limit about some resource(e.g bandwidth). For example, a single L7 loadbalancer on Aliyun have a upper limit of 0.5G bandwidth for incoming request. It's impossible let a single ingress controller to serve all ingress. So, I want to deploy multiple aliyun-ingress-controller pod for "aliyun" class. And have a "scheduling" logic to assign ingress to ingress-controller. In other words, there are multiple aliyun ingress-controller pod, each pod serve a subset of "aliyun" ingress. I understand your and @bowei 's opinion: GCE-ingress-controller serve all "gce" class ingress, nginx-ingress-controller serve all "nginx" class ingress, etc. It works fine on GCE and AWS, since GCE/AWS lb itself is designed to handle the incoming traffic and scale up/down accordingly, which means GCE/AWS lb has no maximum bandwidth limit. So we can deploy a GCE ingress-controller pod to serve all "gce" class ingress. However, on other cloud platform, like Aliyun, a lb has bandwidth limit for incoming request. So, for "aliyun" class ingress, we need to deploy multiple aliyun-ingress-controller pods(thus, multiple L7 lb on IaaS). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Proposal should probably clarify distinction between horizontally autoscaling a collection of homogeneous set of ingress pods (likely simpler and more common use-case) vs sharding ingress resources among distinct controllers or groups of pods. Can you calrify why your case isn't covered by horizontal scaling? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for that great explanation @mqliang! I agree that we will need multiple LB pods (due to bandwidh limitation in the Aliyun example) but is still not clear to me why need multiple ingress controller pods. Single aliyun ingress controller can spin up multiple aliyun LB pods? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @louiscryan In terms of horizontal scaling, it's complex:
Since LB itself is NOT horizontal scalable on some cloud provider(like Aliyun) and bare-metal, we need a mechanism to shard ingress resources among distinct LB. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nikhiljindal You are right: what we actually need is multiple LB, and a mechanism to shard ingress resources among distinct LB. And this is exactly what I want to do in long term. But current ingress-controller implementation (especially the nginx-ingress-controller) actually put the ingress-controller(list&watch logic) and lb(nginx that handle the incoming traffic) together, in other words, currently ingress-controller and looadbalancer are 1-to-1 relationship, so we can just assign ingress to ingress-controller(this way will also make smallest modification to current ingress-controller implementation). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just fyi you can have multiple pod-lbs all listing watching a single ingress resource and adding their IPs to the list of ingress ips. This is how you deploy > 1 replicas (https://github.com/kubernetes/ingress/blob/master/examples/scaling-deployment/nginx/nginx-ingress-deployment.yaml). You need a cloud-lb on top of this to choose between the ips or just DNS or a smart client. |
||
| results in some excessive use of resource. Actually it fine that a cluster | ||
| just has one process which list&watch all ingresses, and call cloud provider | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading this, it seems that the goal of this design is to reduce the number of list&watch on apiserver. Instead of all ingress controllers doing their list&watch, the proposed loadbalancer controller will be the only one performing list&watch. Is that true? If yes then we should state that goal explicitly at top. |
||
| to update loadbalancing rules. | ||
|
|
||
| Thus, we propose to redesign the current ingress-controller as follows: | ||
|
|
||
| Add a loadbalancer-controller in conroller-manager component, which list&watch | ||
| all ingresses resource and call loadbalancer provider to update the | ||
| loadbalancing rules. | ||
|
|
||
| * On cloud provider | ||
|
|
||
| No extra ingress-controler need to be deployed | ||
|
|
||
| * On bare-metal | ||
| * put all loadbalancing rules in configmap | ||
| * ingress-controller not need list&watch any Ingress anymore. Just notify | ||
| the nginx/haproxy process to reload when configmap was updated. | ||
| * loadbalancer-controller in controller-manager component will list&watch | ||
| all Ingress and update corresponding nginx/haproxy's configmap | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using a configmap adds an extra layer in the update path. Does this complexity make sense? How would update latency be impacted? |
||
|
|
||
| #### loadbalancer controller | ||
|
|
||
| Add a loadbalancer-controller in conroller-manager component. Which works as | ||
| follows: | ||
|
|
||
| * bind loadbalancerclaim with loadbalancer | ||
| * dynamically provision loadbalancer on demand | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When we provide dynamic provisioning it could make sense to provide the ability to define a persistent IP either as annotation or as field. |
||
| * recycle or deprovision a loadbalancer when no loadbalancerclaim bind with | ||
| it for a long time. | ||
| * list&watch all ingress and call loadbalancer provider to update l7 | ||
| loadbalancing rules | ||
| * we put loadblancer provider logic in loadbalancer-controlle too. On cloud | ||
| provider, it just delegate all loadbalancer provider logic to cloud provider; | ||
| on bare-metal, it: | ||
| * create a loadbalancer by deploying a nginx/haproxy pod in "kube-system" | ||
| namespace | ||
| * update a loadbalancer by updating the configmap of nginx/haproxy pod | ||
| * delete a loadbalancer by deleting the nginx/haproxy pod and all its relating | ||
| resource(such as configmap, etc). | ||
|
|
||
| ## Implementation plan | ||
|
|
||
| ### First step: make it workable | ||
|
|
||
| * implement the scheduling, provisioning and recycling logic first. | ||
| * ingress-controller just works as currently, but it just list&watch ingresses | ||
| assigned to it, instead of all ingresses | ||
|
|
||
| ### Second step: loadbalancer provider | ||
|
|
||
| * Add loadbalancer provider in loadbalancer-controller | ||
| * refactor the current ingress-controller implementation as I descrive in the | ||
| "Implementation" section and rename "ingress-controller" as | ||
| "nginx-loadbalancer" | ||
|
|
||
| ### Long term | ||
|
|
||
| * loadbalancer scheduling over-commitment | ||
|
|
||
| <!-- BEGIN MUNGE: GENERATED_ANALYTICS --> | ||
| []() | ||
| <!-- END MUNGE: GENERATED_ANALYTICS --> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain "PV" and "PVC" at first use.
[I'm guessing you mean PersistentVolume and PersistentVolumeClaim]