Kubeadm HA ( high availability ) checklist #261

timothysc · 2017-05-02T14:45:42Z

timothysc · 2017-05-12T20:27:27Z

jamiehannaford · 2017-05-23T10:29:19Z

@timothysc In order to do Enable support to ComponentsConfigs to be loaded from ConfigMaps, doesn't both the controller-manager and scheduler need to be able to boot their config from a configmap first? Or is the plan just to add in the configmap manifests to kubeadm so that we can use the new leadership election feature, and pave the way for future configmap use as and when it's implemented?

timothysc · 2017-05-23T15:21:38Z

@jamiehannaford There are 2 parts.

Part is to just load Controller Manager and Scheduler from a file. The plan of record is to use a serialized ComponentConfig object for this. Once that work is done.. ~ 1.8 @mikedanese ? Combined with @ncdc 's example on the proxy we should be able to transition the other components.
From there just volume mounting a ConfigMap will allow this load and also provide the locking location.

ncdc · 2017-05-23T15:27:26Z

I do hope to have time to work on ComponentConfig for all the remaining components over the next couple of releases.

jamiehannaford · 2017-06-06T11:40:57Z

@ncdc @timothysc Is there a wider epic issue for the componentconfig stuff?

timothysc · 2017-06-07T13:54:35Z

@jamiehannaford Yes - https://docs.google.com/document/d/1arP4T9Qkp2SovlJZ_y790sBeiWXDO6SG10pZ_UUU-Lc/edit?ts=59110d75#heading=h.xgjl2srtytjt

jamiehannaford · 2017-06-12T13:34:06Z

@timothysc I've seen the Google doc, I meant a Github issue for tracking work across different components

luxas · 2017-08-19T19:44:09Z

Moving milestone to v1.9. We have a rough design doc in v1.8 and are building the ground work for making HA possible in v1.9

kapilt · 2017-08-23T21:36:28Z

can the docs linked here be made public, all of them get an access permission request form clicking through to them, is there a current design doc extant?

kapilt · 2017-08-23T21:38:44Z

found it kubernetes/enhancements#357

luxas · 2017-08-23T21:59:46Z

@kapilt yes, https://docs.google.com/document/d/1ff70as-CXWeRov8MCUO7UwT-MwIQ_2A0gNNDKpF39U4/edit it is

jethrogb · 2017-11-09T05:24:43Z

I deployed this manually today and ran into the following issue when testing master failover: the IP address specified in kubeadm join is used only for discovery. Then, the IP address specified in the cluster-info ConfigMap is used. (debugging this was extremely painful). In my case, this contained the failed master's IP. It would be good to have a solution here.

KeithTt · 2017-11-16T09:13:50Z

hope for this feature for a long time...

bitgandtter · 2017-11-20T19:54:16Z

is this feature going to make it to 1.9?

luxas · 2017-11-20T20:05:19Z

No, not in its entirety. We have some work in progress, but due to the really tight schedule it's not gonna make alpha in v1.9. Instead we'll focus on documenting how to do HA "manually" #546. There really is a lot of work to make this happen, nobody has done this kind of HA "hands-off" installing flow for k8s yet AFAIK, so we're falling back on what everyone else does for now.

discordianfish · 2017-12-13T10:28:05Z

@luxas fwiw, I think tectonic-installer (but bootkube based) is closes to the goals for kubeadm, worth having a look.

discordianfish · 2018-01-16T12:27:30Z

Here is my stab at kubeadm HA on AWS: https://github.com/itskoko/kubecfn

stealthybox · 2018-01-16T18:33:09Z

@discordianfish wow, that looks like a lot of work -- nice job

kapilt · 2018-01-17T18:04:48Z

@discordianfish thats really nice work, and awesome, you've worked around all the bugs :-) +1

jamiehannaford · 2018-01-17T23:10:30Z

Awesome job @discordianfish. If you had to do any workarounds to get kubeadm working (i.e. to fix kubeadm-specific bugs or shortcomings) would you mind opening an issue so we can document them?

kapilt · 2018-01-18T07:23:32Z

@jamiehannaford the biggest bug that i see in going through the repo, is #411, so effectively you have to rewrite the kubelet config kubeadm generates, to point it from an ip to a dns name for the masters.

the biggest shortcoming being work around seems to be assuming ownership of the etcd cluster management. the rest seems cloud provider specific (lambda to maintain dns mapping as master hosts come around or go out), etc.

[edit] he filed one on kubeadm issues (#609) and referenced from kubecfn repo, notionally this is also related to #411 basically the same issue of not respecting cli paramaeter advertise address as url and converting it early to ip, and then writing an ip to everything kubeadm touches for the master address.

discordianfish · 2018-01-18T13:06:17Z

#411 doesn't effect kubecfn since I'm not using kubeadm on the worker because I couldn't get the token auth to play well with the multi HA setup. Instead I'm just using the admin.conf which isn't ideal an (now) tracked in itskoko/kubecfn#6

#609 is the biggest pain point right now. The workaround is ugly at least (overwriting the configmap after each kubeadm run)

Another minor issue is that some paths are hardcoded in kubeadm, making it harder to pre-generate configs. For that I have to run kubeadm in a docker container. In general, for this project I would have preferred if kubeadm had some offline mode where all it does is generating the config, similar like bootkube is doing it.

Everything else is etcd related which is IMO by far the hardest part to get right in a reliable fashion, even with cloudformation and the signaling. So if the overall experience of setting up a HA cluster should be improved, maybe the etcd bootstrapping process could be made easier. One way would be to ignore SAN/CN completely, which IMO should be still pretty much secure as with checking it. For that I opened etcd-io/etcd#8912

Beside all this, there are some small kubernetes issues I filled which would have saved me tons of time. Things like:

schmitch · 2018-01-18T13:16:14Z

What I found to be the biggest pain is that if I want to use keepalived as the "loadbalancer" and want to run it with keepalived in kubernetes that I first need to bootstrap the kubernetes "master cluster" with one master IP and then rewrite all configs to point to the keepalived ip.
it would be better if the kubernetes master would only use the 'advertise api server'-ip to talk with worker nodes and the master servers should communicate with their local ip. so that one can actually pre-register an "advertise-api-server"-ip that does not exist yet.

my setup is besides that really simple:

have a static haproxy pod on every master that load balances all the master ips (could probably also be just a daemon-set on all masters, but I wanted to create a /etc/kubernetes/haproxy.conf and use it instead of a configmap/volume, a ConfigMap is probably more sane.)
have kube-system-keepalived daemon-set on all master servers (https://github.com/kubernetes/contrib/tree/master/keepalived-vip, but in kube-system namespace)

so basically I just build a kubernetes cluster with one master, create the keepalived service, register two other masters with the keepalived IP and the configmap will be rewritten if kubeadm init is called a second/third time. after that I just need to adjust all ips (kubeadm, kubelet, admin.conf, whatever) inside the first master node to point to the keepalived ip (well I actually used #546 (comment) to bring up my new masters and rewrite all configs/configmap to point to the keepalived ip).

and done. basically the whole setup is self-containing and does not need any external load balancer/whatever you just need a re-routeable ip in your network.

Edit: I also used ingition/cloud-config to bootstrap coreos on vmware. it's really simple. and my etcd runs over coreos and uses rkt. (will be instealled via cloud-config) I actually generated the etcd pki before creating the nodes. I can actually write a guide if somebody needs it, and provide all necessary configs..

my next try would be to use kubelet over rkt, but I'm not sure if that plays well with kubeadm.

kapilt · 2018-01-30T09:21:31Z

@discordianfish fwiw the issue with #411, i think underlies the config map issue, and also causes the write out of the ip in the master config, which the kubcfn makefile uses sed on to restore back to cluster name. its basically that early on the dns name given is over written by the resolved ip, and that gets written out every where kubeadm touches.

timothysc · 2018-04-07T00:45:59Z

Closing this original parent issue as plans have changed, we will have updated issues and pr's coming in 1.11

/cc @fabriziopandini

discordianfish · 2018-04-08T11:15:55Z

@timothysc Where can I read about the new plan?

chulkilee · 2018-05-02T17:31:34Z

I believe the new issue for the new plan is #751

jethrogb · 2018-05-02T23:34:07Z

Somewhat surprised kubernetes/community#1707 wasn't mentioned here when that PR was originally opened.

timothysc mentioned this issue May 3, 2017

Kubeadm multi master support and bug fixes. kubernetes/kubernetes#44793

Closed

timothysc changed the title ~~Kubeadm high availability checklist~~ Kubeadm HA ( high availability ) checklist May 12, 2017

timothysc self-assigned this May 23, 2017

timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 25, 2017

timothysc added this to the v1.8 milestone May 25, 2017

luxas added area/HA area/upgrades kind/enhancement labels May 29, 2017

luxas mentioned this issue May 29, 2017

RFE: Boot-strapping etcd cluster + operator #254

Closed

lewismarshall mentioned this issue Jun 12, 2017

Native kubeadm HA progress nonlive/keto-k8#32

Open

2 tasks

luxas self-assigned this Jul 14, 2017

luxas modified the milestones: v1.9, v1.8 Aug 19, 2017

klausenbusk mentioned this issue Aug 28, 2017

hack/quickstart: Add automatic self-hosted multi-master loadbalancer kubernetes-retired/bootkube#684

Closed

jamiehannaford self-assigned this Oct 10, 2017

This was referenced Oct 11, 2017

Dedicated etcd nodes #491

Closed

Provision nodes with kubeadm kubernetes-retired/kube-aws#654

Closed

scholzj mentioned this issue Oct 16, 2017

Module does not support configuring Kubernetes multi-master scholzj/terraform-aws-kubernetes#4

Open

jamiehannaford mentioned this issue Oct 17, 2017

Make kubeadm deploy HA kubernetes cluster #328

Closed

mbert mentioned this issue Nov 16, 2017

Workarounds for the time before kubeadm HA becomes available #546

Closed

kcao3 mentioned this issue Nov 17, 2017

k8s HA cluster setup cookeem/kubeadm-ha#7

Closed

luxas modified the milestones: v1.9, v1.10 Nov 20, 2017

petergardfjall mentioned this issue Dec 14, 2017

validate AdvertiseAddress in kubeadm init and other case kubernetes/kubernetes#56956

Merged

discordianfish mentioned this issue Dec 21, 2017

Create new flag to specify apiserver address for kube-proxy #609

Closed

srolel mentioned this issue Jan 3, 2018

Use kubeadm to set up cluster kubernetes-digitalocean-terraform/kubernetes-digitalocean-terraform#51

Merged

fabriziopandini mentioned this issue Jan 14, 2018

[WIP] Kubeadm join --master kubernetes/kubernetes#58261

Closed

timothysc modified the milestones: v1.10, v1.11 Jan 24, 2018

timothysc added the triaged label Jan 31, 2018

timothysc closed this as completed Apr 7, 2018

simonklb mentioned this issue Jun 16, 2018

Add support for domain name as advertise address kubernetes/apiserver#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeadm HA ( high availability ) checklist #261

Kubeadm HA ( high availability ) checklist #261

timothysc commented May 2, 2017 •

edited by pipejakob

Loading

timothysc commented May 12, 2017 •

edited

Loading

jamiehannaford commented May 23, 2017

timothysc commented May 23, 2017 •

edited

Loading

ncdc commented May 23, 2017

jamiehannaford commented Jun 6, 2017

timothysc commented Jun 7, 2017

jamiehannaford commented Jun 12, 2017

luxas commented Aug 19, 2017

kapilt commented Aug 23, 2017

kapilt commented Aug 23, 2017

luxas commented Aug 23, 2017

jethrogb commented Nov 9, 2017 •

edited

Loading

KeithTt commented Nov 16, 2017

bitgandtter commented Nov 20, 2017

luxas commented Nov 20, 2017 •

edited

Loading

discordianfish commented Dec 13, 2017

discordianfish commented Jan 16, 2018

stealthybox commented Jan 16, 2018

kapilt commented Jan 17, 2018

jamiehannaford commented Jan 17, 2018

kapilt commented Jan 18, 2018 •

edited

Loading

discordianfish commented Jan 18, 2018

schmitch commented Jan 18, 2018 •

edited

Loading

kapilt commented Jan 30, 2018

timothysc commented Apr 7, 2018

discordianfish commented Apr 8, 2018

chulkilee commented May 2, 2018

jethrogb commented May 2, 2018

Kubeadm HA ( high availability ) checklist #261

Kubeadm HA ( high availability ) checklist #261

Comments

timothysc commented May 2, 2017 • edited by pipejakob Loading

timothysc commented May 12, 2017 • edited Loading

jamiehannaford commented May 23, 2017

timothysc commented May 23, 2017 • edited Loading

ncdc commented May 23, 2017

jamiehannaford commented Jun 6, 2017

timothysc commented Jun 7, 2017

jamiehannaford commented Jun 12, 2017

luxas commented Aug 19, 2017

kapilt commented Aug 23, 2017

kapilt commented Aug 23, 2017

luxas commented Aug 23, 2017

jethrogb commented Nov 9, 2017 • edited Loading

KeithTt commented Nov 16, 2017

bitgandtter commented Nov 20, 2017

luxas commented Nov 20, 2017 • edited Loading

discordianfish commented Dec 13, 2017

discordianfish commented Jan 16, 2018

stealthybox commented Jan 16, 2018

kapilt commented Jan 17, 2018

jamiehannaford commented Jan 17, 2018

kapilt commented Jan 18, 2018 • edited Loading

discordianfish commented Jan 18, 2018

schmitch commented Jan 18, 2018 • edited Loading

kapilt commented Jan 30, 2018

timothysc commented Apr 7, 2018

discordianfish commented Apr 8, 2018

chulkilee commented May 2, 2018

jethrogb commented May 2, 2018

timothysc commented May 2, 2017 •

edited by pipejakob

Loading

timothysc commented May 12, 2017 •

edited

Loading

timothysc commented May 23, 2017 •

edited

Loading

jethrogb commented Nov 9, 2017 •

edited

Loading

luxas commented Nov 20, 2017 •

edited

Loading

kapilt commented Jan 18, 2018 •

edited

Loading

schmitch commented Jan 18, 2018 •

edited

Loading