-
Notifications
You must be signed in to change notification settings - Fork 5.3k
add loadbalancer and loadbalancerclaim proposal #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
02168e5 to
e1c5c4f
Compare
|
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| * we need to deploy multiple ingress-controller pod even on cloud provider, it | ||
| results in some excessive use of resource. Actually it fine that a cluster | ||
| just has one process which list&watch all ingresses, and call cloud provider | ||
| provider to update loadbalancing rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spell check: Duplicate words provider
|
|
||
| * implement the scheduling, provisioning and recycling logic first. | ||
| * ingress-controller just works as currently, but it just list&watch ingresses | ||
| assigned to it, insteal all ingresses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spell check: insteal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Specifications.
e1c5c4f to
b0f1a10
Compare
|
👍 |
| As mentioned before, LoadBalancer and LoadBalancerClaim binding is not exclusive, | ||
| which means multiple LoadBalancerClaim can be bound to one LoadBalancer. For | ||
| example, if we have a LoadBalancer with 3G bandwidth, we can bind 6 LoadBalancerClaim | ||
| each request 500m bandwidth on it. In such case, we need to a 'scheduling' logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-to
| each request 500m bandwidth on it. In such case, we need to a 'scheduling' logic. | ||
|
|
||
| Further, we can eventually introduce the 'request/limit model' for network resource | ||
| to acheieve functionalities already implemented in compute resource, for example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*achieve
| * ingress-controller not need list&watch any Ingress anymore. Just notify | ||
| the nginx/haproxy process to reload when configmap was updated. | ||
| * loadbalancer-controller in controller-manager component will list&watch | ||
| all Ingress and update corresponding nginx/haproxy's configmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a configmap adds an extra layer in the update path. Does this complexity make sense? How would update latency be impacted?
What about LB implementations that have the controller (list&watch component) built in such as traefik?
| follows: | ||
|
|
||
| * bind loadbalancerclaim with loadbalancer | ||
| * dynamically provision loadbalancer on demand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we provide dynamic provisioning it could make sense to provide the ability to define a persistent IP either as annotation or as field.
|
@mqliang Has this been presented to SIG network? |
| * introduce a new API object `LoadBalancer` to represent a loadbalancer (think of PV) | ||
| * introduce a new API object `LoadBalancerClaim` to claim a new or existing | ||
| LoadBalancer (think of PVC) | ||
| * change existing Ingress API to include a `LBSource` field, which either directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ## Background | ||
|
|
||
| ### Current Ingress behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this also be used for services of type LoadBalancer?
The lack of explicit tracking of cloud-provider load balancers has been a problem (e.g., kubernetes/kubernetes#32157, kubernetes/kubernetes#15203)
Will discuss this at the forthcoming SIG-network meeting. |
| ## Overview | ||
|
|
||
| The proposal aims to define a set of APIs to abstract loadbalancer resource, | ||
| similar to how PV/PVC works. In the end, we'll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain "PV" and "PVC" at first use.
[I'm guessing you mean PersistentVolume and PersistentVolumeClaim]
| to acheieve functionalities already implemented in compute resource, for example, | ||
| qos and overcommit. | ||
|
|
||
| ##### Manually assign LoadBalancerClaim to a LodBalancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Lod->Load
| qos and overcommit. | ||
|
|
||
| ##### Manually assign LoadBalancerClaim to a LodBalancer | ||
| User can also manually assign LoadBalancerClaim to a LodBalancer, instead of letting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Lod->Load
|
Could you update the information in https://docs.google.com/spreadsheets/d/1lHSZBl7YJvKN0qNT8i0oxYaRh38CcemJlUFjD37hhDU for SIG-NETWORK status? |
|
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@mqliang Is this something you want to try to press forward in 1.8? |
|
@thockin loadbalancer/loadbalancerclaim is very helpful for automating the ingress-controller deployment process. And I have some bandwidth in 1.8. So, I can get involved to press it forward if community think what I proposed is worth doing. BTW, this proposal deserve more discussion. |
|
cc @louiscryan |
|
cc @bowei |
| * change existing Ingress API to include a `LBSource` field, which either directly | ||
| uses a LoadBalancer, or via a LoadBalancerClaim (think of Pod using PV or PVC) | ||
|
|
||
| We will also introduce network resource such as bandwidth and iops. LoadBalancerClaim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the IO in iops correspond to?
Seems more natural to have requests per second?
| bind or schedule Ingress resource to a ingress-controller Pod, which result in: | ||
| * insufficient or excessive use of neworking resource | ||
| * reload storm when updating Ingress resource, or due to Pod changes | ||
| * Ingress resource is actually internet l7 loadbalancing rules, intranet l7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to not be relevant to the proposal? (it's a separate feature I would assume)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be a separate feature
|
|
||
| * How many ingress-controller Pods should run in a cluster? Should all | ||
| ingress-controller Pod list&watch all Ingress resource? There is no way to | ||
| bind or schedule Ingress resource to a ingress-controller Pod, which result in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One Ingress controller handles the setup for all services of a certain class in the cluster. I understand why you would run multiple kinds of Ingress controllers (one for handling each type), but why would we run more than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It‘s mainly because a single L7 loadbalancer may have a upper limit about some resource(e.g bandwidth). For example, a single L7 loadbalancer on Aliyun have a upper limit of 0.5G bandwidth for incoming requests. It's impossible let a single ingress controller to serve all ingress.
So, instead of let a single "Aliyun" class ingress-controller to serve all "Aliyun" class ingress. I want deploy multiple "Aliyun" class ingress-controller, and have some "scheduling" or "binding" logic to assign ingress to ingress-controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like an implementation detail as to how many pods are used to implement the controller logic, it may not be 1-1 as you are suggesting here. It's not clear to me why we should expose this to the user of k8s.
We should also be careful to distinguish between "ingress" spec which describes a L7 exposed service and the actual LBs that implement the service.
Does each Aliyun LB Ingress have a different IP? Or do they share the same IP to the external user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does each Aliyun LB Ingress have a different IP? Or do they share the same IP to the external user?
Yes, each Aliyun L7 LB have a different IP. See documents here https://www.alibabacloud.com/help/doc-detail/32459.htm?spm=a3c0i.o32460en.b99.6.387f4cf8yNv1ao
I understand that aws/gce LB is very powerful, basically you can send as many traffic to LB as possible and LB itself will scale up/down accordingly. There is no maximum bandwidth defined for a single LB on aws/gce.
However, on other cloud platform and bare-metal , L7 LB itself is not horizontally scalable
| loadbalancing rules has not been supported yet. Eventually, We need a general | ||
| mechanism for both the Ingress and intranet L7 lb rules "consume" a | ||
| loadbalancer | ||
| * On bare-metal, it does not provide High Availability because client needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the ingress controller IP need to be known? Its role is to watch the API server and do the control flow work to create the load balancer, rather than be the load balancer itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also do you need the controller pod IP or the node IP? If its the controller pod IP, you can use a kube service instead, in which case rescheduling to a different node should not matter.
|
|
||
| LoadBalancer Provider Interface is an interface to Create/Update/Delete LoadBalancer. | ||
| There can be multiple LoadBalancer provider implementations, such as: | ||
| * AWS loadbalancer provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some more details what the provider interface will look like? What functions will it support?
|
|
||
| * LoadBalancer HA on bare-metal | ||
| * LoadBalancerClass: different types (internet or intranet), different qos level, etc | ||
| * LoadBalancer scheduling and over-commitment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused -- this seems to appear both in the goal and non-goal sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the misleading. Simple scheduling logic(assign ingress to ingress-controller) is the goal. And scheduling over-commitment is currently not the goal.
| */ | ||
| } | ||
|
|
||
| type GCELoadBalancerSource struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to put the cloud provider specific pieces in this proposal? I would expect the API to not mention any cloud providers explicitly.
| type BareMetalLoadBalancerType string | ||
|
|
||
| const ( | ||
| NginxLoadBalancer BareMetalLoadBalancerType = "Nginx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about cloud providers. The feature should be load balancer type agnostic.
|
Glad to receive your feeback, @bowei Actually, I am working with @mqliang to refactor this proposal and a google doc version is available now. @mqliang, please public the google doc link when you think it's ready for review. BTW, I already added an agenda at the forthcoming SIG-network meeting. We can discuss more details there. Thanks a lot! I am sure I will be back before SIG-network meeting :) |
| loadbalancing rules has not been supported yet. Eventually, We need a general | ||
| mechanism for both the Ingress and intranet L7 lb rules "consume" a | ||
| loadbalancer | ||
| * On bare-metal, it does not provide High Availability because client needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also do you need the controller pod IP or the node IP? If its the controller pod IP, you can use a kube service instead, in which case rescheduling to a different node should not matter.
|
|
||
| loadbalancer-controller is responsible for: | ||
|
|
||
| * list&watch Ingress resources and call Loadbalancer provider to update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the call to the LB provider an in process call?
What I like about the existing implementation (of independent ingress controllers for each ingress provider) is that if I want to introduce a new ingress provider in my cluster (for ex: ha-proxy), I can pick the ha-proxy ingress controller and just run it in my cluster without affecting other ingress controllers.
With this new design, I will have to change my load balancer controller binary to import the ha-proxy provider as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
| The current loadbalancer (ingress-controller) implementation has some limitations | ||
| when we have multiple loadbalancers: | ||
|
|
||
| * we need to deploy multiple ingress-controller pod even on cloud provider, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to deploy multiple ingress controller pods? Do you mean multiple instances of the same controller (for HA) or different pods for each ingress class? If its the second then, admin can choose to deploy the ingress controllers only for ingress classes they want to enable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean multiple instances of the same controller (for HA)
No.
different pods for each ingress class.
No.
I want deploy multiple pod even for the same ingress class. It‘s mainly because a single L7 loadbalancer may have a upper limit about some resource(e.g bandwidth). For example, a single L7 loadbalancer on Aliyun have a upper limit of 0.5G bandwidth for incoming request. It's impossible let a single ingress controller to serve all ingress.
So, I want to deploy multiple aliyun-ingress-controller pod for "aliyun" class. And have a "scheduling" logic to assign ingress to ingress-controller. In other words, there are multiple aliyun ingress-controller pod, each pod serve a subset of "aliyun" ingress.
I understand your and @bowei 's opinion: GCE-ingress-controller serve all "gce" class ingress, nginx-ingress-controller serve all "nginx" class ingress, etc. It works fine on GCE and AWS, since GCE/AWS lb itself is designed to handle the incoming traffic and scale up/down accordingly, which means GCE/AWS lb has no maximum bandwidth limit. So we can deploy a GCE ingress-controller pod to serve all "gce" class ingress.
However, on other cloud platform, like Aliyun, a lb has bandwidth limit for incoming request. So, for "aliyun" class ingress, we need to deploy multiple aliyun-ingress-controller pods(thus, multiple L7 lb on IaaS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal should probably clarify distinction between horizontally autoscaling a collection of homogeneous set of ingress pods (likely simpler and more common use-case) vs sharding ingress resources among distinct controllers or groups of pods.
Can you calrify why your case isn't covered by horizontal scaling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that great explanation @mqliang!
I think there is some confusion between the ingress controller pod (that manages LB's for ingresses) and the L7 load balancer pod (that actually balances the user traffic).
I agree that we will need multiple LB pods (due to bandwidh limitation in the Aliyun example) but is still not clear to me why need multiple ingress controller pods. Single aliyun ingress controller can spin up multiple aliyun LB pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@louiscryan In terms of horizontal scaling, it's complex:
- On GCE/AWS: LB itself is horizontal scalable, basically you can send as many traffic to LB as possible and LB itself will scale up/down accordingly. There is no maximum bandwidth defined for a single LB on aws/gce
- On other cloud provider(like Aliyun): LB is not horizontal scalable, a single LB on Aliyun have a upper limit of 0.5G bandwidth for incoming request.
- On bare-metal: LB is not horizontal scalable. LB is implemented using nginx/haproxy and use Keepalived for HA. But Keepalived is a cold backup solution, it's not horizontal scalable(no matter how many LB instance you add, there is only one instance handle the incoming traffic)
Since LB itself is NOT horizontal scalable on some cloud provider(like Aliyun) and bare-metal, we need a mechanism to shard ingress resources among distinct LB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhiljindal You are right: what we actually need is multiple LB, and a mechanism to shard ingress resources among distinct LB. And this is exactly what I want to do in long term.
But current ingress-controller implementation (especially the nginx-ingress-controller) actually put the ingress-controller(list&watch logic) and lb(nginx that handle the incoming traffic) together, in other words, currently ingress-controller and looadbalancer are 1-to-1 relationship, so we can just assign ingress to ingress-controller(this way will also make smallest modification to current ingress-controller implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ingress-controller and looadbalancer are 1-to-1 relationship
Just fyi you can have multiple pod-lbs all listing watching a single ingress resource and adding their IPs to the list of ingress ips. This is how you deploy > 1 replicas (https://github.com/kubernetes/ingress/blob/master/examples/scaling-deployment/nginx/nginx-ingress-deployment.yaml). You need a cloud-lb on top of this to choose between the ips or just DNS or a smart client.
|
|
||
| * we need to deploy multiple ingress-controller pod even on cloud provider, it | ||
| results in some excessive use of resource. Actually it fine that a cluster | ||
| just has one process which list&watch all ingresses, and call cloud provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, it seems that the goal of this design is to reduce the number of list&watch on apiserver. Instead of all ingress controllers doing their list&watch, the proposed loadbalancer controller will be the only one performing list&watch.
Is that true? If yes then we should state that goal explicitly at top.
To make things more clear:The problemL7 Loadbalancer is NOT horizontal scalable on all platform:
Since LB itself is NOT horizontal scalable on some cloud provider(like Aliyun) and bare-metal, we need a flexible mechanism to:
Of course, we can simply shard ingress by namespace, but it's not flexible, think of a cluster with many "small" namespaces, we will have lots of L7 LB, which is waste of resourse and actually L7 LB is a cluster-scopped networking resource which can be shared across namespace. What the API mean:LoadBalancer:LoadBalancer is kind of networking resource, which can handle incoming traffic. It is a networking resource in the cluster just like a Node is a computing resource and PersistentVolume is a storage resource. LoadBalancer is usually provisioned by an administrator, and it’s a cluster-scope resource, which can be shared across namespace. (Think of Node and PersistentVolume, they are cluster-scope computing resource and storage resource, respectively. Similarly, LoadBalancer is cluster-scope networking resource) Every LoadBalancer has some attributes, for example, max bandwidth it can handle, what type of LoadBalancer it is(GCE/AWS/NGINX). LoadBalancerClaim:LoadBalancerClaim is a request for LoadBalancer by user. Ingresses consume LoadBalancerClaim resources and LoadBalancerClaim consume LoadBalancer resources. Since LoadBalancer is a cluster-scope resource, and Ingress is a namespace-scope resource, Ingress can not directly consume a LoadBalancer. Instead, there is a “LBSource“ field in IngressSpec, which refers a LoadBalancerClaim resource, in the LoadBalancerClaim resource user specified what a kind of LoadBalancer and how many bandwidth the Ingress need. In other words, Ingress actually consume LoadBalancer indirectly via LoadBalancerClaim. (Think of Pod mount a PersistetVolme indirectly via PersistetVolmeClaim) Implementation OptionsOption 1: implement most of the logic in Controller-ManagerIngress-ControllerIf we implement in this way, we basically treat a ingress-controller as a L7 loadbalancer. one ingress-controller correspond with one L7 loadbalancer. Current ingress-controller implementation does not need make big change, except that it just list&watch the Ingress assigned to it(There is a loadbalancer-controller in Controller-Manager component, which is responsible for assigning Ingress to ingress-controller). LoadBalancer ControllerAdd a new loadbalancer-controller in Controller-Manager component, which works as follows:
Pros and Cons
Option 2: implement most of the logic in Ingress-ControllerIf we implement in this way, a single Ingress-Controller should manage multiple loadbalancer. On bare-metal:
On cloud provider:
Pros and Cons
|
|
There are other options you should add:
|
|
Re-open when ready with details from new doc? |
|
sure. I believe we will re-open it soon :) |
sdake is a Doc and Environment maintainer as well as a lead for the Environemnt WG. Steve undertook a large portion of the work related to builds using a buld container which would fall under the T&R umbrella. Comments in PR# 20605 shows support from the other T&R WG leads as well as Lin.
@kubernetes/sig-network-proposals @kubernetes/sig-network-misc