Automate stacked etcd into kubeadm join --control-plane workflow #1123

fabriziopandini · 2018-09-18T09:52:06Z

Stacked etcd is as a manual procedure described in https://kubernetes.io/docs/setup/independent/high-availability/.

However, kubeadm could automate the stacked etcd procedure as new step of the kubeadm join --control-plane workflow.

Some design decision should be taken before implementing.

A) Stacked etcd should be a “trasparent” evolution of current local etcd mode or B) users will be requested to explicitly opt-in on stacked etcd e.g. by using a dedicated config type?
C) The number of Stacked etcd should be “tied” to the number of controlplane instances or D) we would like to scale etcd be separated from control plane scaling (e.g kubeadm join --etcd)

Considering the goal of keeping kubeadm simple and maintainable, IMO preferred options are A) and C)… wdyt?

cc @detiber @chuckha @timothysc

The text was updated successfully, but these errors were encountered:

chuckha · 2018-09-18T15:05:52Z

C seems like the simplest solution, but I'd love to hear more about A. I think we've really got a couple of use cases, stacked control plane nodes scale out to some number n nodes before etcd needs to have dedicated hosts, then it would be great if we had a path to get switch to external/dedicated hosts.

I'd rule out B and D for now unless there is a compelling reason to add that complexity.

fabriziopandini · 2018-09-18T16:02:09Z

@chuckha

I'd love to hear more about A (Stacked etcd should be a “trasparent” evolution of current local etcd mode)

From what I understand stacked etcd is an etcd instance like local etcd, with the difference that it listen on a public IP instead of 127.0.0.1 and it has a bunch of additional flags/certificate sans .
Why not changing local etcd static pod manifest to be equal to stacked etcd manifest?

in case of new cluster, the the "new local etcd" will work with a single etcd member, like "old local etcd" (no regression), and all new cluster will be natively ready for adding new control plane & etcd members
in case of existing cluster, kubeadm upgrade will automatically turn "old local etcd" into "new local etcd", so basically all v1.13 will be ready for scaling up control plane and etcd

Does this sound reasonable to you?

it would be great if we had a path to get switch to external/dedicated hosts

Great suggestions, let's keep this in mind as well

detiber · 2018-09-20T14:34:05Z

A) Stacked etcd should be a “trasparent” evolution of current local etcd mode or

If I'm understanding this option, it would basically just extend the existing local etcd mode to support the additional flags, SANs, etc that the stacked deployment currently uses and is mainly about providing an upgrade path for existing local etcd-based deployments rather than providing HA support itself. Is that correct?

That said, it would require config changes to make work, since we would need to expand the per-node configuration to include etcd config/overrides for things such as which IP, hostname, or SANs to use (if the defaults are not sufficient).

B) users will be requested to explicitly opt-in on stacked etcd e.g. by using a dedicated config type?

I don't like this option as it requires users to make a decision for HA/non-HA support before starting.

C) The number of Stacked etcd should be “tied” to the number of controlplane instances or

+1 for this, if there is a need to have a different number of etcd hosts vs controlplane instances, then external etcd should be used instead.

D) we would like to scale etcd be separated from control plane scaling (e.g kubeadm join --etcd)

While I could see some value in this, the ability to use it would be limited since we don't provide a way to init a single etcd instance. I would expect that workflow to look like the following:

<host 1> kubeadm init --etcd
<host 2> kubeadm join --etcd
<host 3> kubeadm join --etcd

Where the entire etcd cluster is bootstrapped prior to bootstrapping control plane instances. Currently that workflow would require that kubeadm now have access to the client certificate to manipulate etcd, which is not currently the case. I'm not exactly sure how we are currently handling this for extending the control plane.

The nice thing about this approach is that it would simplify the external etcd story as well, but I think it should be in addition to C rather than in place of C if we support that workflow. I think we'd also probably want to break them out into separate high level commands, since we wouldn't necessarily be fully configuring the kubelets to join the overall cluster in that use case.

fabriziopandini · 2018-09-20T16:31:17Z

@detiber happy to see we are on that same page here!

it would basically just extend the existing local etcd mode ...rather than providing HA support itself

Yes, but with the addition than when before adding etcd members we are going to call etcdctl member add on one of the existing members.

This will increase HA of the cluster, with the caveat that each API server use only the etcd endpoint of its own local etcd (instead of the list of etcd endpoints). So if an etcd member fails, all the control plane components on the same node will fail and everything will be switched to another control plane node.

NB. This can improved up to a certain extent by passing to the API server the list of etcd endpoints known at the moment of join

it would require config changes to make work

Yes but I consider this changes less invasive than creating a whole new etcd type.
On top of that I think that we can use advertise address and hostname as a reasonable defaults, so the user will be required to set additional config options only in few cases

I think we'd also probably want to break them out into separate high level commands
I think it should be in addition to C rather than in place of C

+1
if we want to have a sound story around etcd alone this should be addressed properly. For the time being I will be more than happy to improve the story about control plane and/tied to etcd, that is part of kubeadm since it's inception

detiber · 2018-09-20T17:46:40Z

@fabriziopandini For the issue with the control plane being fully dependent on the local etcd, there is an issue to track the lack of etcd auto sync support within Kubernetes itself: kubernetes/kubernetes#64742

fabriziopandini · 2018-10-03T12:33:30Z

/lifecycle active

@detiber @chuckha @timothysc
I have a working prototype of the approach discussed above 😃

kubeadm init > creates a local etcd instance similar to the one described here. The main difference vs now is that it uses another IP address instead of 127.0.0.1

- etcd
    - --advertise-client-urls=https://10.10.10.11:2379  
    - --initial-advertise-peer-urls=https://10.10.10.11:2380
    - --initial-cluster=master1=https://10.10.10.11:2380
    - --listen-client-urls=https://127.0.0.1:2379, https://10.10.10.11:2379
    - --listen-peer-urls=https://10.10.10.11:2380
    ....

kubeadm join --control-plane > adds a second etcd instance similar to the one described here. In case of joining etcd instances, the etcd manifest is slightly different and contains all the existing etcd members + the joining one, and also the --initial-cluster-state flag is set to existing

- etcd
    - --initial-cluster=master1=https://10.10.10.11:2380,master2=https://10.10.10.12:2380
    - --initial-cluster-state=existing
    ....

So far so good.

Now the tricky question. kubeadm upgrade....

When kubeadm executes upgrades it will recreate the etcd manifest. Are there any settings I should take care of because I'm upgrading an etcd cluster instead of an etcd single instance?
More specifically, are there any recommended values for --initial-cluster and --initial-cluster-state or simply I don't care because my etcd cluster already exists and I'm basically changing only the etcd binary?

fabriziopandini · 2018-10-03T12:54:02Z

@detiber @chuckha @timothysc
from coreos doc

--initial prefix flags are used in bootstrapping (static bootstrap, discovery-service bootstrap or runtime reconfiguration) a new member, and ignored when restarting an existing member.

so it doesn't matter which values I assign to --initial-cluster and --initial-cluster-state.

Considering this my idea is to keep upgrade workflow "simple" and generate the new etcd manifest without compiling the --initial-cluster with the full list of etcd members.

Opinions?

fabriziopandini · 2018-10-03T17:03:28Z

Last bit:
What IP address should we use for etcd? if we are going to use the API server advertise address for etcd as well this will simplify things a lot...

fabriziopandini · 2018-11-02T07:12:16Z

/close

k8s-ci-robot · 2018-11-02T07:12:17Z

@fabriziopandini: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fabriziopandini added this to the v1.13 milestone Sep 18, 2018

fabriziopandini mentioned this issue Sep 18, 2018

Checklist for kubeadm join --control-plane implementation #751

Closed

11 tasks

neolit123 mentioned this issue Sep 18, 2018

Tracking issue for "Config to v1beta1" #963

Closed

28 tasks

fabriziopandini mentioned this issue Sep 18, 2018

Create a v1beta1 version for kubeadm config API types #911

Closed

davidewatson mentioned this issue Sep 20, 2018

Fixes to update for master and worker nodes samsung-cnct/cluster-api-provider-ssh#92

Merged

fabriziopandini mentioned this issue Sep 20, 2018

Run kubeadm init command on the host that already create etcd with kubeadm #1107

Closed

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 3, 2018

fabriziopandini mentioned this issue Oct 6, 2018

kubeadm stacked etcd kubernetes/kubernetes#69486

Merged

timothysc assigned fabriziopandini Oct 30, 2018

k8s-ci-robot closed this as completed Nov 2, 2018

neolit123 mentioned this issue Mar 28, 2019

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate stacked etcd into kubeadm join --control-plane workflow #1123

Automate stacked etcd into kubeadm join --control-plane workflow #1123

fabriziopandini commented Sep 18, 2018

chuckha commented Sep 18, 2018

fabriziopandini commented Sep 18, 2018

detiber commented Sep 20, 2018

fabriziopandini commented Sep 20, 2018

detiber commented Sep 20, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Nov 2, 2018

k8s-ci-robot commented Nov 2, 2018

Automate stacked etcd into kubeadm join --control-plane workflow #1123

Automate stacked etcd into kubeadm join --control-plane workflow #1123

Comments

fabriziopandini commented Sep 18, 2018

chuckha commented Sep 18, 2018

fabriziopandini commented Sep 18, 2018

detiber commented Sep 20, 2018

fabriziopandini commented Sep 20, 2018

detiber commented Sep 20, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Oct 3, 2018

fabriziopandini commented Nov 2, 2018

k8s-ci-robot commented Nov 2, 2018