New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress HA, Scheduling, and Provisioning Proposal #34013
Conversation
Jenkins GCI GKE smoke e2e failed for commit afa6d7a344ae1b106598396a2f6b0964b1b0d9b8. Full PR test history. The magic incantation to run this job again is |
Jenkins GKE smoke e2e failed for commit afa6d7a344ae1b106598396a2f6b0964b1b0d9b8. Full PR test history. The magic incantation to run this job again is |
afa6d7a
to
d95360a
Compare
Jenkins verification failed for commit d95360a. Full PR test history. The magic incantation to run this job again is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubernetes/sig-network
overall suggest streamlining this with the claims issue: #30151
## Goal | ||
This Proposal aims to address the above issues by the following mechanism: | ||
|
||
* Ingress HA: using keepalived and VIP to provide High Availability(mainly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify how much this gives us over just running a Service over the ingress controller(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean running a NodePort type Service over the ingress controller(s) and use keepalived-vip to provide HA for the Service?
I just want to simplify Ingress creation:
- create Ingress ReplicaSet
- create a NodePort type Service over the Ingress ReplicaSet
- create a keeplived-vip daemonset to provide HA for the Ingress Service
It's not a "happy path" and hinder automation(especially considering the Ingress auto provision). Instead, it would be much helpful to just create a Ingress ReplicaSet, and implement the HA logic in Ingress Pod.
for nginx/haproxy implementation, cloud implementation usually already | ||
provide HA) | ||
* Ingress Scheduling: schedule Ingress Resource to Ingress Pod/ReplicaSet | ||
* Ingress Provisioning: allow user to dynamically add Ingress Pod/ReplicaSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good goal, one we thought to solve with ingress claims: #30151. we haven't fleshed out the model yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty important feature. Is anyone actively working on it ? We've already started working on some of it, @mqliang will update the proposal in a few days.
* Ingress HA: using keepalived and VIP to provide High Availability(mainly | ||
for nginx/haproxy implementation, cloud implementation usually already | ||
provide HA) | ||
* Ingress Scheduling: schedule Ingress Resource to Ingress Pod/ReplicaSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want scheduling, or is that taking it too far?
With claims you have different qos classes, and a user picks a class.
The ingress is satisfied by whatever's behind that class, be it a single pod, a group of pods, a cloud lb etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be three mechanism for choosing:
- if user know exactly which Ingress Pod/RS to use, just use it.
- If user don't know exactly which Ingress Pod/RS to use, but he know he want a Ingress Pod with some properties(cpu/mem/bandwidth available, nginx/haproxy/cloud lb implementation, etc), he can claim one, the kubernetes will iterate all the existing Ingress Pod/RS to find a best matching for him (in other words, it's scheduling)
- user can also claim a Ingress Pod/RS with a "auto-provision" annotation, in such a case, kubernetes will dynamically provide one.
Just like the relationship between PV and PVC,:
- if user want use a specific cloud disk(and he know the cloud disk id), just use it.
- If user just want a PV with XXX size and some properties, he can claim a PV by creating a PVC: kubernetes will find a best matching PV(scheduling) for user.
- user also can create PV with auto-provision annotation, in such a case kubernetes will call cloud provider api to create a cloud disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's involved in scheduling, qos? I agree with @bprashanth this is a little far; maybe you just mean bound?
If user don't know exactly which Ingress Pod/RS to use, but he know he want a Ingress Pod with some properties(cpu/mem/bandwidth available, nginx/haproxy/cloud lb implementation, etc), he can claim one, the kubernetes will iterate all the existing Ingress Pod/RS to find a best matching for him (in other words, it's scheduling)
* using AntiAffinity feature so that Ingress Pod created by the same Ingress | ||
ReplicaSet could be scheduled to different node | ||
* cluster admin choose a CIDR for Ingress VIP(AKA IngreeVIPCIDR) | ||
* each Ingress Replicaset will be allocated a VIP from IngreeVIPCIDR(allocated by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the claims proposal, each ingress claim would get a vip.
"Ingress Replicaset" is a term which doesn't make sense to me, today an Ingress points to Services which may point to replicassets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe for "Ingress ReplicaSet", @mqliang means something that actually hold the vip, kind of like the ingress controller in current setup. However, by "each ingress claim would get a vip", you seem to suggest that the life cycle of vip is bound to claim? I'm trying to figure out the difference and if we want to apply the PV/PVC model to ingress claim.
If we do want to use the PV/PVC model, then there needs to be such "Ingress ReplicaSet" which actually hold the VIP, not the claim holding the VIP. Then, user can create ingress claims to claim VIP. It is the VIP, not the claim, that have attributes on it, like qos.
If not, then the only new type we need is ingress claim, in this case, how do we add more ingress resources to an existing claim? For example, if user creates an ingress claim for ".foo.com", a vip is allocated for it, but not yet useful; next, user can create ingress resources to consume the DNS and vip. Now some more ingress resources are needed for DNS ".bar.com" and user wants to use the same vip, how do they achieve this? Do they have to edit their claim?
* Ingress ReplicaSet rolling update | ||
|
||
## Ingress HA | ||
(AKA: Ingress Virtual IP using keepalived) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to seperate the keepalived details from the api. We need a way to get a vip, public of private. That maybe keepalived, or iptables proxy, or something new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
* Ingress ReplicaSets are created by cluster admin in advance | ||
* If all Ingress Pods are saturated, it's cluster admin's duty to create | ||
more Ingress ReplicaSets | ||
* There is a Ingress Scheduler which will schedule Ingress Resources to Ingress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this might not be necessary. Users pick an Ingress claim based on QoS needs. The Ingress claim has a vip. The vip is backed by pods. If the pods are saturated, the admin or an autoscaler needs to scale them, but the vip doesn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ingress claim has a vip
It seems more reasonable that:
- Ingress Service have a vip, external or internal(if user want in-cluster L7 loadbalancing), Ingress Service was backended by several Ingress Pods(may be created by a ReplicaSet)
- User can specify a Ingress Service for Ingress Resource
- If user don't now which Ingress Service to specify, it can use a IngressClaim, then a ingress-claim-controller will iterate through all Ingress Services and find a best matching one and bind the Ingress Service with IngressClaim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the admin or an autoscaler needs to scale them
scale up or scale out?
the IP addresss of the node where Ingress Pod is running. In case of a | ||
failure the Ingress Pod can be be moved to a different node. | ||
* How many Ingress Pod should run in a cluster? Should all Ingress Pod | ||
list&watch all Ingress Resource with out distinction? There is no way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/with out/without
## Ingress Provisoning | ||
|
||
#### High level design | ||
* Ingress ReplicaSets could be dynamically provisioned on deman, instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/deman/demand
This PR hasn't been active in 30 days. It will be closed in 59 days (Jan 10, 2017). You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days |
close this in flavor of #37269 |
The idea in this doc is proposed at a high level, API/Implementation design is also rough mainly for illustrating the idea behind it. Comments are welcomed.
@kubernetes/sig-network
cc @ddysher @hongchaodeng
This change is