New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm HA ( high availability ) checklist #261

Closed
timothysc opened this Issue May 2, 2017 · 32 comments

Comments

@timothysc
Member

timothysc commented May 2, 2017

The following is a checklist for kubeadm support for deploying HA-clusters. This a distillation of action items from:
https://docs.google.com/document/d/1lH9OKkFZMSqXCApmSXemEDuy9qlINdm5MfWWGrK3JYc/edit#heading=h.8hdxw3quu67g

but there may be more.

New Features:

Contentious:

Extending Support & Documentation:

Cleanup cruft:

  • Remove docs and contrib references to older HA.

/cc @kubernetes/sig-cluster-lifecycle-feature-requests

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc
Member

timothysc commented May 12, 2017

@timothysc timothysc changed the title from Kubeadm high availability checklist to Kubeadm HA ( high availability ) checklist May 12, 2017

@jamiehannaford

This comment has been minimized.

Show comment
Hide comment
@jamiehannaford

jamiehannaford May 23, 2017

Member

@timothysc In order to do Enable support to ComponentsConfigs to be loaded from ConfigMaps, doesn't both the controller-manager and scheduler need to be able to boot their config from a configmap first? Or is the plan just to add in the configmap manifests to kubeadm so that we can use the new leadership election feature, and pave the way for future configmap use as and when it's implemented?

Member

jamiehannaford commented May 23, 2017

@timothysc In order to do Enable support to ComponentsConfigs to be loaded from ConfigMaps, doesn't both the controller-manager and scheduler need to be able to boot their config from a configmap first? Or is the plan just to add in the configmap manifests to kubeadm so that we can use the new leadership election feature, and pave the way for future configmap use as and when it's implemented?

@timothysc timothysc self-assigned this May 23, 2017

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc May 23, 2017

Member

@jamiehannaford There are 2 parts.

  1. Part is to just load Controller Manager and Scheduler from a file. The plan of record is to use a serialized ComponentConfig object for this. Once that work is done.. ~ 1.8 @mikedanese ? Combined with @ncdc 's example on the proxy we should be able to transition the other components.

  2. From there just volume mounting a ConfigMap will allow this load and also provide the locking location.

Member

timothysc commented May 23, 2017

@jamiehannaford There are 2 parts.

  1. Part is to just load Controller Manager and Scheduler from a file. The plan of record is to use a serialized ComponentConfig object for this. Once that work is done.. ~ 1.8 @mikedanese ? Combined with @ncdc 's example on the proxy we should be able to transition the other components.

  2. From there just volume mounting a ConfigMap will allow this load and also provide the locking location.

@ncdc

This comment has been minimized.

Show comment
Hide comment
@ncdc

ncdc May 23, 2017

Member

I do hope to have time to work on ComponentConfig for all the remaining components over the next couple of releases.

Member

ncdc commented May 23, 2017

I do hope to have time to work on ComponentConfig for all the remaining components over the next couple of releases.

@jamiehannaford

This comment has been minimized.

Show comment
Hide comment
@jamiehannaford

jamiehannaford Jun 6, 2017

Member

@ncdc @timothysc Is there a wider epic issue for the componentconfig stuff?

Member

jamiehannaford commented Jun 6, 2017

@ncdc @timothysc Is there a wider epic issue for the componentconfig stuff?

@jamiehannaford

This comment has been minimized.

Show comment
Hide comment
@jamiehannaford

jamiehannaford Jun 12, 2017

Member

@timothysc I've seen the Google doc, I meant a Github issue for tracking work across different components

Member

jamiehannaford commented Jun 12, 2017

@timothysc I've seen the Google doc, I meant a Github issue for tracking work across different components

@lewismarshall lewismarshall referenced this issue Jun 12, 2017

Open

Native kubeadm HA progress #32

0 of 2 tasks complete

@luxas luxas self-assigned this Jul 14, 2017

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Aug 19, 2017

Member

Moving milestone to v1.9. We have a rough design doc in v1.8 and are building the ground work for making HA possible in v1.9

Member

luxas commented Aug 19, 2017

Moving milestone to v1.9. We have a rough design doc in v1.8 and are building the ground work for making HA possible in v1.9

@luxas luxas modified the milestones: v1.9, v1.8 Aug 19, 2017

@kapilt

This comment has been minimized.

Show comment
Hide comment
@kapilt

kapilt Aug 23, 2017

can the docs linked here be made public, all of them get an access permission request form clicking through to them, is there a current design doc extant?

kapilt commented Aug 23, 2017

can the docs linked here be made public, all of them get an access permission request form clicking through to them, is there a current design doc extant?

@kapilt

This comment has been minimized.

Show comment
Hide comment
@kapilt

kapilt commented Aug 23, 2017

@luxas

This comment has been minimized.

Show comment
Hide comment
@XiLongZheng

This comment has been minimized.

Show comment
Hide comment
@XiLongZheng

XiLongZheng Oct 24, 2017

@luxas , just curious whether this is still targeted for 1.9?

XiLongZheng commented Oct 24, 2017

@luxas , just curious whether this is still targeted for 1.9?

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Oct 26, 2017

Merge pull request #54539 from jamiehannaford/add-ha-feature-gate
Automatic merge from submit-queue (batch tested with PRs 54593, 54607, 54539, 54105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add HA feature gate and minVersion validation

**What this PR does / why we need it**:

As we add more feature gates, there might be occasions where a feature is only available on newer releases of K8s. If a user makes a mistake, we should notify them as soon as possible in the init procedure and not them go down the path of hard-to-debug component issues.

Specifically with HA, we ideally need the new `TaintNodesByCondition` (added in v1.8.0 but working in v1.9.0).

**Which issue this PR fixes:**

kubernetes/kubeadm#261
kubernetes/kubeadm#277

**Release note**:
```release-note
Feature gates now check minimum versions
```

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @luxas @timothysc
@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Oct 27, 2017

Member

@XiLongZheng Yes, we hope so. Latest design doc we'll check into Github next week: https://docs.google.com/document/d/1P3oUJ_kdaRSTlGONujadGBpYegjn4RjBNZLHZ4zU7lI/edit#

Member

luxas commented Oct 27, 2017

@XiLongZheng Yes, we hope so. Latest design doc we'll check into Github next week: https://docs.google.com/document/d/1P3oUJ_kdaRSTlGONujadGBpYegjn4RjBNZLHZ4zU7lI/edit#

@klausenbusk

This comment has been minimized.

Show comment
Hide comment
@klausenbusk

klausenbusk Nov 1, 2017

@XiLongZheng Yes, we hope so. Latest design doc we'll check into Github next week: https://docs.google.com/document/d/1P3oUJ_kdaRSTlGONujadGBpYegjn4RjBNZLHZ4zU7lI/edit#

I remember a (rather complicated?) proposal for the load balancer issue, but can't find it right now. Anyone else remembering that proposal? IIRC it was something about tweaking /etc/hosts when bootstraping and then relaying on the kubernetes service.

klausenbusk commented Nov 1, 2017

@XiLongZheng Yes, we hope so. Latest design doc we'll check into Github next week: https://docs.google.com/document/d/1P3oUJ_kdaRSTlGONujadGBpYegjn4RjBNZLHZ4zU7lI/edit#

I remember a (rather complicated?) proposal for the load balancer issue, but can't find it right now. Anyone else remembering that proposal? IIRC it was something about tweaking /etc/hosts when bootstraping and then relaying on the kubernetes service.

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Nov 1, 2017

Member

@klausenbusk Yeah, but we dropped that as it actually didn't work well after some experimenting.
Now we're thinking about reusing the kube-proxy for loadbalancing to the VIP short-term (or using a real, external LB obvs.), long-term using something like Envoy

Member

luxas commented Nov 1, 2017

@klausenbusk Yeah, but we dropped that as it actually didn't work well after some experimenting.
Now we're thinking about reusing the kube-proxy for loadbalancing to the VIP short-term (or using a real, external LB obvs.), long-term using something like Envoy

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Nov 1, 2017

Merge pull request #54543 from jamiehannaford/self-hosted-etcd-api
Automatic merge from submit-queue (batch tested with PRs 49840, 54937, 54543). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add self-hosted etcd API to kubeadm

**What this PR does / why we need it**:

This PR is part of a larger set that implements self-hosted etcd. This PR takes a first step by adding:

1. new API types in `cmd/kubeadm/app/apis` for configuring self-hosted etcd 
2. new Go types in `cmd/kubeadm/app/phases/etcd/spec` used for constructing EtcdCluster CRDs for the etcd-operator. The reason we define these in trunk is because kubeadm cannot import `github.com/coreos/etcd-operator` as a dependency until it's in its own repo. Until then, we need to redefine the structs in our codebase.

**Which issue this PR fixes**:

kubernetes/kubeadm#261
kubernetes/kubeadm#277

**Special notes for your reviewer**:

This is the first step PR in order to save reviewers from a goliath PR

**Release note**:
```release-note
NONE
```
@jethrogb

This comment has been minimized.

Show comment
Hide comment
@jethrogb

jethrogb Nov 9, 2017

I deployed this manually today and ran into the following issue when testing master failover: the IP address specified in kubeadm join is used only for discovery. Then, the IP address specified in the cluster-info ConfigMap is used. (debugging this was extremely painful). In my case, this contained the failed master's IP. It would be good to have a solution here.

jethrogb commented Nov 9, 2017

I deployed this manually today and ran into the following issue when testing master failover: the IP address specified in kubeadm join is used only for discovery. Then, the IP address specified in the cluster-info ConfigMap is used. (debugging this was extremely painful). In my case, this contained the failed master's IP. It would be good to have a solution here.

@KeithTt

This comment has been minimized.

Show comment
Hide comment
@KeithTt

KeithTt Nov 16, 2017

hope for this feature for a long time...

KeithTt commented Nov 16, 2017

hope for this feature for a long time...

@bitgandtter

This comment has been minimized.

Show comment
Hide comment
@bitgandtter

bitgandtter Nov 20, 2017

is this feature going to make it to 1.9?

bitgandtter commented Nov 20, 2017

is this feature going to make it to 1.9?

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Nov 20, 2017

Member

No, not in its entirety. We have some work in progress, but due to the really tight schedule it's not gonna make alpha in v1.9. Instead we'll focus on documenting how to do HA "manually" #546. There really is a lot of work to make this happen, nobody has done this kind of HA "hands-off" installing flow for k8s yet AFAIK, so we're falling back on what everyone else does for now.

Member

luxas commented Nov 20, 2017

No, not in its entirety. We have some work in progress, but due to the really tight schedule it's not gonna make alpha in v1.9. Instead we'll focus on documenting how to do HA "manually" #546. There really is a lot of work to make this happen, nobody has done this kind of HA "hands-off" installing flow for k8s yet AFAIK, so we're falling back on what everyone else does for now.

@luxas luxas modified the milestones: v1.9, v1.10 Nov 20, 2017

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Dec 13, 2017

@luxas fwiw, I think tectonic-installer (but bootkube based) is closes to the goals for kubeadm, worth having a look.

discordianfish commented Dec 13, 2017

@luxas fwiw, I think tectonic-installer (but bootkube based) is closes to the goals for kubeadm, worth having a look.

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Jan 16, 2018

Here is my stab at kubeadm HA on AWS: https://github.com/itskoko/kubecfn

discordianfish commented Jan 16, 2018

Here is my stab at kubeadm HA on AWS: https://github.com/itskoko/kubecfn

@stealthybox

This comment has been minimized.

Show comment
Hide comment
@stealthybox

stealthybox Jan 16, 2018

Contributor

@discordianfish wow, that looks like a lot of work -- nice job

Contributor

stealthybox commented Jan 16, 2018

@discordianfish wow, that looks like a lot of work -- nice job

@kapilt

This comment has been minimized.

Show comment
Hide comment
@kapilt

kapilt Jan 17, 2018

@discordianfish thats really nice work, and awesome, you've worked around all the bugs :-) +1

kapilt commented Jan 17, 2018

@discordianfish thats really nice work, and awesome, you've worked around all the bugs :-) +1

@jamiehannaford

This comment has been minimized.

Show comment
Hide comment
@jamiehannaford

jamiehannaford Jan 17, 2018

Member

Awesome job @discordianfish. If you had to do any workarounds to get kubeadm working (i.e. to fix kubeadm-specific bugs or shortcomings) would you mind opening an issue so we can document them?

Member

jamiehannaford commented Jan 17, 2018

Awesome job @discordianfish. If you had to do any workarounds to get kubeadm working (i.e. to fix kubeadm-specific bugs or shortcomings) would you mind opening an issue so we can document them?

@kapilt

This comment has been minimized.

Show comment
Hide comment
@kapilt

kapilt Jan 18, 2018

@jamiehannaford the biggest bug that i see in going through the repo, is #411, so effectively you have to rewrite the kubelet config kubeadm generates, to point it from an ip to a dns name for the masters.

the biggest shortcoming being work around seems to be assuming ownership of the etcd cluster management. the rest seems cloud provider specific (lambda to maintain dns mapping as master hosts come around or go out), etc.

[edit] he filed one on kubeadm issues (#609) and referenced from kubecfn repo, notionally this is also related to #411 basically the same issue of not respecting cli paramaeter advertise address as url and converting it early to ip, and then writing an ip to everything kubeadm touches for the master address.

kapilt commented Jan 18, 2018

@jamiehannaford the biggest bug that i see in going through the repo, is #411, so effectively you have to rewrite the kubelet config kubeadm generates, to point it from an ip to a dns name for the masters.

the biggest shortcoming being work around seems to be assuming ownership of the etcd cluster management. the rest seems cloud provider specific (lambda to maintain dns mapping as master hosts come around or go out), etc.

[edit] he filed one on kubeadm issues (#609) and referenced from kubecfn repo, notionally this is also related to #411 basically the same issue of not respecting cli paramaeter advertise address as url and converting it early to ip, and then writing an ip to everything kubeadm touches for the master address.

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Jan 18, 2018

#411 doesn't effect kubecfn since I'm not using kubeadm on the worker because I couldn't get the token auth to play well with the multi HA setup. Instead I'm just using the admin.conf which isn't ideal an (now) tracked in itskoko/kubecfn#6

#609 is the biggest pain point right now. The workaround is ugly at least (overwriting the configmap after each kubeadm run)

Another minor issue is that some paths are hardcoded in kubeadm, making it harder to pre-generate configs. For that I have to run kubeadm in a docker container. In general, for this project I would have preferred if kubeadm had some offline mode where all it does is generating the config, similar like bootkube is doing it.

Everything else is etcd related which is IMO by far the hardest part to get right in a reliable fashion, even with cloudformation and the signaling. So if the overall experience of setting up a HA cluster should be improved, maybe the etcd bootstrapping process could be made easier. One way would be to ignore SAN/CN completely, which IMO should be still pretty much secure as with checking it. For that I opened etcd-io/etcd#8912

Beside all this, there are some small kubernetes issues I filled which would have saved me tons of time. Things like:

discordianfish commented Jan 18, 2018

#411 doesn't effect kubecfn since I'm not using kubeadm on the worker because I couldn't get the token auth to play well with the multi HA setup. Instead I'm just using the admin.conf which isn't ideal an (now) tracked in itskoko/kubecfn#6

#609 is the biggest pain point right now. The workaround is ugly at least (overwriting the configmap after each kubeadm run)

Another minor issue is that some paths are hardcoded in kubeadm, making it harder to pre-generate configs. For that I have to run kubeadm in a docker container. In general, for this project I would have preferred if kubeadm had some offline mode where all it does is generating the config, similar like bootkube is doing it.

Everything else is etcd related which is IMO by far the hardest part to get right in a reliable fashion, even with cloudformation and the signaling. So if the overall experience of setting up a HA cluster should be improved, maybe the etcd bootstrapping process could be made easier. One way would be to ignore SAN/CN completely, which IMO should be still pretty much secure as with checking it. For that I opened etcd-io/etcd#8912

Beside all this, there are some small kubernetes issues I filled which would have saved me tons of time. Things like:

@schmitch

This comment has been minimized.

Show comment
Hide comment
@schmitch

schmitch Jan 18, 2018

What I found to be the biggest pain is that if I want to use keepalived as the "loadbalancer" and want to run it with keepalived in kubernetes that I first need to bootstrap the kubernetes "master cluster" with one master IP and then rewrite all configs to point to the keepalived ip.
it would be better if the kubernetes master would only use the 'advertise api server'-ip to talk with worker nodes and the master servers should communicate with their local ip. so that one can actually pre-register an "advertise-api-server"-ip that does not exist yet.

my setup is besides that really simple:

  • have a static haproxy pod on every master that load balances all the master ips (could probably also be just a daemon-set on all masters, but I wanted to create a /etc/kubernetes/haproxy.conf and use it instead of a configmap/volume, a ConfigMap is probably more sane.)
  • have kube-system-keepalived daemon-set on all master servers (https://github.com/kubernetes/contrib/tree/master/keepalived-vip, but in kube-system namespace)

so basically I just build a kubernetes cluster with one master, create the keepalived service, register two other masters with the keepalived IP and the configmap will be rewritten if kubeadm init is called a second/third time. after that I just need to adjust all ips (kubeadm, kubelet, admin.conf, whatever) inside the first master node to point to the keepalived ip (well I actually used #546 (comment) to bring up my new masters and rewrite all configs/configmap to point to the keepalived ip).

and done. basically the whole setup is self-containing and does not need any external load balancer/whatever you just need a re-routeable ip in your network.

Edit: I also used ingition/cloud-config to bootstrap coreos on vmware. it's really simple. and my etcd runs over coreos and uses rkt. (will be instealled via cloud-config) I actually generated the etcd pki before creating the nodes. I can actually write a guide if somebody needs it, and provide all necessary configs..

my next try would be to use kubelet over rkt, but I'm not sure if that plays well with kubeadm.

schmitch commented Jan 18, 2018

What I found to be the biggest pain is that if I want to use keepalived as the "loadbalancer" and want to run it with keepalived in kubernetes that I first need to bootstrap the kubernetes "master cluster" with one master IP and then rewrite all configs to point to the keepalived ip.
it would be better if the kubernetes master would only use the 'advertise api server'-ip to talk with worker nodes and the master servers should communicate with their local ip. so that one can actually pre-register an "advertise-api-server"-ip that does not exist yet.

my setup is besides that really simple:

  • have a static haproxy pod on every master that load balances all the master ips (could probably also be just a daemon-set on all masters, but I wanted to create a /etc/kubernetes/haproxy.conf and use it instead of a configmap/volume, a ConfigMap is probably more sane.)
  • have kube-system-keepalived daemon-set on all master servers (https://github.com/kubernetes/contrib/tree/master/keepalived-vip, but in kube-system namespace)

so basically I just build a kubernetes cluster with one master, create the keepalived service, register two other masters with the keepalived IP and the configmap will be rewritten if kubeadm init is called a second/third time. after that I just need to adjust all ips (kubeadm, kubelet, admin.conf, whatever) inside the first master node to point to the keepalived ip (well I actually used #546 (comment) to bring up my new masters and rewrite all configs/configmap to point to the keepalived ip).

and done. basically the whole setup is self-containing and does not need any external load balancer/whatever you just need a re-routeable ip in your network.

Edit: I also used ingition/cloud-config to bootstrap coreos on vmware. it's really simple. and my etcd runs over coreos and uses rkt. (will be instealled via cloud-config) I actually generated the etcd pki before creating the nodes. I can actually write a guide if somebody needs it, and provide all necessary configs..

my next try would be to use kubelet over rkt, but I'm not sure if that plays well with kubeadm.

@timothysc timothysc modified the milestones: v1.10, v1.11 Jan 24, 2018

@kapilt

This comment has been minimized.

Show comment
Hide comment
@kapilt

kapilt Jan 30, 2018

@discordianfish fwiw the issue with #411, i think underlies the config map issue, and also causes the write out of the ip in the master config, which the kubcfn makefile uses sed on to restore back to cluster name. its basically that early on the dns name given is over written by the resolved ip, and that gets written out every where kubeadm touches.

kapilt commented Jan 30, 2018

@discordianfish fwiw the issue with #411, i think underlies the config map issue, and also causes the write out of the ip in the master config, which the kubcfn makefile uses sed on to restore back to cluster name. its basically that early on the dns name given is over written by the resolved ip, and that gets written out every where kubeadm touches.

@timothysc timothysc added the triaged label Jan 31, 2018

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Apr 7, 2018

Member

Closing this original parent issue as plans have changed, we will have updated issues and pr's coming in 1.11

/cc @fabriziopandini

Member

timothysc commented Apr 7, 2018

Closing this original parent issue as plans have changed, we will have updated issues and pr's coming in 1.11

/cc @fabriziopandini

@timothysc timothysc closed this Apr 7, 2018

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Apr 8, 2018

@timothysc Where can I read about the new plan?

discordianfish commented Apr 8, 2018

@timothysc Where can I read about the new plan?

@chulkilee

This comment has been minimized.

Show comment
Hide comment
@chulkilee

chulkilee May 2, 2018

I believe the new issue for the new plan is #751

chulkilee commented May 2, 2018

I believe the new issue for the new plan is #751

@jethrogb

This comment has been minimized.

Show comment
Hide comment
@jethrogb

jethrogb May 2, 2018

Somewhat surprised kubernetes/community#1707 wasn't mentioned here when that PR was originally opened.

jethrogb commented May 2, 2018

Somewhat surprised kubernetes/community#1707 wasn't mentioned here when that PR was originally opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment