Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate stacked etcd into kubeadm join --control-plane workflow #1123

Closed
fabriziopandini opened this issue Sep 18, 2018 · 10 comments
Closed
Assignees
Labels
area/HA area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@fabriziopandini
Copy link
Member

Stacked etcd is as a manual procedure described in https://kubernetes.io/docs/setup/independent/high-availability/.

However, kubeadm could automate the stacked etcd procedure as new step of the kubeadm join --control-plane workflow.

Some design decision should be taken before implementing.

  • A) Stacked etcd should be a “trasparent” evolution of current local etcd mode or B) users will be requested to explicitly opt-in on stacked etcd e.g. by using a dedicated config type?
  • C) The number of Stacked etcd should be “tied” to the number of controlplane instances or D) we would like to scale etcd be separated from control plane scaling (e.g kubeadm join --etcd)

Considering the goal of keeping kubeadm simple and maintainable, IMO preferred options are A) and C)… wdyt?

cc @detiber @chuckha @timothysc

@fabriziopandini fabriziopandini added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/HA priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/UX kind/feature Categorizes issue or PR as related to a new feature. labels Sep 18, 2018
@fabriziopandini fabriziopandini added this to the v1.13 milestone Sep 18, 2018
@chuckha
Copy link

chuckha commented Sep 18, 2018

C seems like the simplest solution, but I'd love to hear more about A. I think we've really got a couple of use cases, stacked control plane nodes scale out to some number n nodes before etcd needs to have dedicated hosts, then it would be great if we had a path to get switch to external/dedicated hosts.

I'd rule out B and D for now unless there is a compelling reason to add that complexity.

@fabriziopandini
Copy link
Member Author

@chuckha

I'd love to hear more about A (Stacked etcd should be a “trasparent” evolution of current local etcd mode)

From what I understand stacked etcd is an etcd instance like local etcd, with the difference that it listen on a public IP instead of 127.0.0.1 and it has a bunch of additional flags/certificate sans .
Why not changing local etcd static pod manifest to be equal to stacked etcd manifest?

  • in case of new cluster, the the "new local etcd" will work with a single etcd member, like "old local etcd" (no regression), and all new cluster will be natively ready for adding new control plane & etcd members
  • in case of existing cluster, kubeadm upgrade will automatically turn "old local etcd" into "new local etcd", so basically all v1.13 will be ready for scaling up control plane and etcd

Does this sound reasonable to you?

it would be great if we had a path to get switch to external/dedicated hosts

Great suggestions, let's keep this in mind as well

@detiber
Copy link
Member

detiber commented Sep 20, 2018

A) Stacked etcd should be a “trasparent” evolution of current local etcd mode or

If I'm understanding this option, it would basically just extend the existing local etcd mode to support the additional flags, SANs, etc that the stacked deployment currently uses and is mainly about providing an upgrade path for existing local etcd-based deployments rather than providing HA support itself. Is that correct?

That said, it would require config changes to make work, since we would need to expand the per-node configuration to include etcd config/overrides for things such as which IP, hostname, or SANs to use (if the defaults are not sufficient).

B) users will be requested to explicitly opt-in on stacked etcd e.g. by using a dedicated config type?

I don't like this option as it requires users to make a decision for HA/non-HA support before starting.

C) The number of Stacked etcd should be “tied” to the number of controlplane instances or

+1 for this, if there is a need to have a different number of etcd hosts vs controlplane instances, then external etcd should be used instead.

D) we would like to scale etcd be separated from control plane scaling (e.g kubeadm join --etcd)

While I could see some value in this, the ability to use it would be limited since we don't provide a way to init a single etcd instance. I would expect that workflow to look like the following:

  • <host 1> kubeadm init --etcd
  • <host 2> kubeadm join --etcd
  • <host 3> kubeadm join --etcd

Where the entire etcd cluster is bootstrapped prior to bootstrapping control plane instances. Currently that workflow would require that kubeadm now have access to the client certificate to manipulate etcd, which is not currently the case. I'm not exactly sure how we are currently handling this for extending the control plane.

The nice thing about this approach is that it would simplify the external etcd story as well, but I think it should be in addition to C rather than in place of C if we support that workflow. I think we'd also probably want to break them out into separate high level commands, since we wouldn't necessarily be fully configuring the kubelets to join the overall cluster in that use case.

@fabriziopandini
Copy link
Member Author

@detiber happy to see we are on that same page here!

it would basically just extend the existing local etcd mode ...rather than providing HA support itself

Yes, but with the addition than when before adding etcd members we are going to call etcdctl member add on one of the existing members.

This will increase HA of the cluster, with the caveat that each API server use only the etcd endpoint of its own local etcd (instead of the list of etcd endpoints). So if an etcd member fails, all the control plane components on the same node will fail and everything will be switched to another control plane node.

NB. This can improved up to a certain extent by passing to the API server the list of etcd endpoints known at the moment of join

it would require config changes to make work

Yes but I consider this changes less invasive than creating a whole new etcd type.
On top of that I think that we can use advertise address and hostname as a reasonable defaults, so the user will be required to set additional config options only in few cases

I think we'd also probably want to break them out into separate high level commands
I think it should be in addition to C rather than in place of C

+1
if we want to have a sound story around etcd alone this should be addressed properly. For the time being I will be more than happy to improve the story about control plane and/tied to etcd, that is part of kubeadm since it's inception

@detiber
Copy link
Member

detiber commented Sep 20, 2018

@fabriziopandini For the issue with the control plane being fully dependent on the local etcd, there is an issue to track the lack of etcd auto sync support within Kubernetes itself: kubernetes/kubernetes#64742

@fabriziopandini
Copy link
Member Author

/lifecycle active

@detiber @chuckha @timothysc
I have a working prototype of the approach discussed above 😃

  1. kubeadm init > creates a local etcd instance similar to the one described here. The main difference vs now is that it uses another IP address instead of 127.0.0.1
- etcd
    - --advertise-client-urls=https://10.10.10.11:2379  
    - --initial-advertise-peer-urls=https://10.10.10.11:2380
    - --initial-cluster=master1=https://10.10.10.11:2380
    - --listen-client-urls=https://127.0.0.1:2379, https://10.10.10.11:2379
    - --listen-peer-urls=https://10.10.10.11:2380
    ....
  1. kubeadm join --control-plane > adds a second etcd instance similar to the one described here. In case of joining etcd instances, the etcd manifest is slightly different and contains all the existing etcd members + the joining one, and also the --initial-cluster-state flag is set to existing
- etcd
    - --initial-cluster=master1=https://10.10.10.11:2380,master2=https://10.10.10.12:2380
    - --initial-cluster-state=existing
    ....

So far so good.

Now the tricky question. kubeadm upgrade....

When kubeadm executes upgrades it will recreate the etcd manifest. Are there any settings I should take care of because I'm upgrading an etcd cluster instead of an etcd single instance?
More specifically, are there any recommended values for --initial-cluster and --initial-cluster-state or simply I don't care because my etcd cluster already exists and I'm basically changing only the etcd binary?

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 3, 2018
@fabriziopandini
Copy link
Member Author

@detiber @chuckha @timothysc
from coreos doc

--initial prefix flags are used in bootstrapping (static bootstrap, discovery-service bootstrap or runtime reconfiguration) a new member, and ignored when restarting an existing member.

so it doesn't matter which values I assign to --initial-cluster and --initial-cluster-state.

Considering this my idea is to keep upgrade workflow "simple" and generate the new etcd manifest without compiling the --initial-cluster with the full list of etcd members.

Opinions?

@fabriziopandini
Copy link
Member Author

Last bit:
What IP address should we use for etcd? if we are going to use the API server advertise address for etcd as well this will simplify things a lot...

@fabriziopandini
Copy link
Member Author

/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/HA area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants