Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress HA, Scheduling, and Provisioning Proposal #34013

Closed
wants to merge 1 commit into from

Conversation

mqliang
Copy link
Contributor

@mqliang mqliang commented Oct 4, 2016

The idea in this doc is proposed at a high level, API/Implementation design is also rough mainly for illustrating the idea behind it. Comments are welcomed.

@kubernetes/sig-network

cc @ddysher @hongchaodeng


This change is Reviewable

@mqliang mqliang changed the title ingress proposal HA, Scheduling, and Provisioning Proposal Oct 4, 2016
@mqliang mqliang changed the title HA, Scheduling, and Provisioning Proposal Ingress HA, Scheduling, and Provisioning Proposal Oct 4, 2016
@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Oct 4, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit afa6d7a344ae1b106598396a2f6b0964b1b0d9b8. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit afa6d7a344ae1b106598396a2f6b0964b1b0d9b8. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit d95360a. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Copy link
Contributor

@bprashanth bprashanth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubernetes/sig-network
overall suggest streamlining this with the claims issue: #30151

## Goal
This Proposal aims to address the above issues by the following mechanism:

* Ingress HA: using keepalived and VIP to provide High Availability(mainly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify how much this gives us over just running a Service over the ingress controller(s)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean running a NodePort type Service over the ingress controller(s) and use keepalived-vip to provide HA for the Service?

I just want to simplify Ingress creation:

  • create Ingress ReplicaSet
  • create a NodePort type Service over the Ingress ReplicaSet
  • create a keeplived-vip daemonset to provide HA for the Ingress Service

It's not a "happy path" and hinder automation(especially considering the Ingress auto provision). Instead, it would be much helpful to just create a Ingress ReplicaSet, and implement the HA logic in Ingress Pod.

for nginx/haproxy implementation, cloud implementation usually already
provide HA)
* Ingress Scheduling: schedule Ingress Resource to Ingress Pod/ReplicaSet
* Ingress Provisioning: allow user to dynamically add Ingress Pod/ReplicaSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good goal, one we thought to solve with ingress claims: #30151. we haven't fleshed out the model yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty important feature. Is anyone actively working on it ? We've already started working on some of it, @mqliang will update the proposal in a few days.

* Ingress HA: using keepalived and VIP to provide High Availability(mainly
for nginx/haproxy implementation, cloud implementation usually already
provide HA)
* Ingress Scheduling: schedule Ingress Resource to Ingress Pod/ReplicaSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want scheduling, or is that taking it too far?
With claims you have different qos classes, and a user picks a class.
The ingress is satisfied by whatever's behind that class, be it a single pod, a group of pods, a cloud lb etc.

Copy link
Contributor Author

@mqliang mqliang Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be three mechanism for choosing:

  • if user know exactly which Ingress Pod/RS to use, just use it.
  • If user don't know exactly which Ingress Pod/RS to use, but he know he want a Ingress Pod with some properties(cpu/mem/bandwidth available, nginx/haproxy/cloud lb implementation, etc), he can claim one, the kubernetes will iterate all the existing Ingress Pod/RS to find a best matching for him (in other words, it's scheduling)
  • user can also claim a Ingress Pod/RS with a "auto-provision" annotation, in such a case, kubernetes will dynamically provide one.

Just like the relationship between PV and PVC,:

  • if user want use a specific cloud disk(and he know the cloud disk id), just use it.
  • If user just want a PV with XXX size and some properties, he can claim a PV by creating a PVC: kubernetes will find a best matching PV(scheduling) for user.
  • user also can create PV with auto-provision annotation, in such a case kubernetes will call cloud provider api to create a cloud disk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's involved in scheduling, qos? I agree with @bprashanth this is a little far; maybe you just mean bound?

If user don't know exactly which Ingress Pod/RS to use, but he know he want a Ingress Pod with some properties(cpu/mem/bandwidth available, nginx/haproxy/cloud lb implementation, etc), he can claim one, the kubernetes will iterate all the existing Ingress Pod/RS to find a best matching for him (in other words, it's scheduling)

* using AntiAffinity feature so that Ingress Pod created by the same Ingress
ReplicaSet could be scheduled to different node
* cluster admin choose a CIDR for Ingress VIP(AKA IngreeVIPCIDR)
* each Ingress Replicaset will be allocated a VIP from IngreeVIPCIDR(allocated by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the claims proposal, each ingress claim would get a vip.
"Ingress Replicaset" is a term which doesn't make sense to me, today an Ingress points to Services which may point to replicassets.

Copy link
Contributor

@ddysher ddysher Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for "Ingress ReplicaSet", @mqliang means something that actually hold the vip, kind of like the ingress controller in current setup. However, by "each ingress claim would get a vip", you seem to suggest that the life cycle of vip is bound to claim? I'm trying to figure out the difference and if we want to apply the PV/PVC model to ingress claim.

If we do want to use the PV/PVC model, then there needs to be such "Ingress ReplicaSet" which actually hold the VIP, not the claim holding the VIP. Then, user can create ingress claims to claim VIP. It is the VIP, not the claim, that have attributes on it, like qos.

If not, then the only new type we need is ingress claim, in this case, how do we add more ingress resources to an existing claim? For example, if user creates an ingress claim for ".foo.com", a vip is allocated for it, but not yet useful; next, user can create ingress resources to consume the DNS and vip. Now some more ingress resources are needed for DNS ".bar.com" and user wants to use the same vip, how do they achieve this? Do they have to edit their claim?

* Ingress ReplicaSet rolling update

## Ingress HA
(AKA: Ingress Virtual IP using keepalived)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to seperate the keepalived details from the api. We need a way to get a vip, public of private. That maybe keepalived, or iptables proxy, or something new.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

* Ingress ReplicaSets are created by cluster admin in advance
* If all Ingress Pods are saturated, it's cluster admin's duty to create
more Ingress ReplicaSets
* There is a Ingress Scheduler which will schedule Ingress Resources to Ingress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this might not be necessary. Users pick an Ingress claim based on QoS needs. The Ingress claim has a vip. The vip is backed by pods. If the pods are saturated, the admin or an autoscaler needs to scale them, but the vip doesn't change.

Copy link
Contributor Author

@mqliang mqliang Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ingress claim has a vip

It seems more reasonable that:

  • Ingress Service have a vip, external or internal(if user want in-cluster L7 loadbalancing), Ingress Service was backended by several Ingress Pods(may be created by a ReplicaSet)
  • User can specify a Ingress Service for Ingress Resource
  • If user don't now which Ingress Service to specify, it can use a IngressClaim, then a ingress-claim-controller will iterate through all Ingress Services and find a best matching one and bind the Ingress Service with IngressClaim.

Copy link
Contributor Author

@mqliang mqliang Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the admin or an autoscaler needs to scale them

scale up or scale out?

@bgrant0607 bgrant0607 assigned bprashanth and unassigned bgrant0607 Oct 5, 2016
the IP addresss of the node where Ingress Pod is running. In case of a
failure the Ingress Pod can be be moved to a different node.
* How many Ingress Pod should run in a cluster? Should all Ingress Pod
list&watch all Ingress Resource with out distinction? There is no way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/with out/without

## Ingress Provisoning

#### High level design
* Ingress ReplicaSets could be dynamically provisioned on deman, instead of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/deman/demand

@ddysher ddysher mentioned this pull request Oct 13, 2016
@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 10, 2016
@k8s-github-robot
Copy link

This PR hasn't been active in 30 days. It will be closed in 59 days (Jan 10, 2017).

cc @bprashanth @mqliang

You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days

@apelisse apelisse removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 11, 2016
@mqliang
Copy link
Contributor Author

mqliang commented Nov 22, 2016

close this in flavor of #37269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants