Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concern about node count for minimal HA control plane with external etcd #42691

Closed
tjanson opened this issue Aug 23, 2023 · 14 comments
Closed

Concern about node count for minimal HA control plane with external etcd #42691

tjanson opened this issue Aug 23, 2023 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/duplicate Indicates an issue is a duplicate of other open issue.

Comments

@tjanson
Copy link
Contributor

tjanson commented Aug 23, 2023

The section External etcd topology on the page Options for Highly Available Topology of the kubeadm cluster setup section states:

A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this [external etcd] topology.

I may be mistaken, but wouldn't the minimum number of control plane nodes in this case be two? (Though perhaps not advisable, at least technically.) That gives us a redundant pair of each CP component (apiserver, controller-manager and scheduler), as well as the HA three node etcd cluster.

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 23, 2023
@tjanson
Copy link
Contributor Author

tjanson commented Aug 23, 2023

I see now this is a duplicate of (stale, closed) #33033.

/language en
/kind bug
/sig architecture

@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language kind/bug Categorizes issue or PR as related to a bug. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Aug 23, 2023
@neolit123
Copy link
Member

/sig cluster-lifecycle

there was a blog post at k8s.io about HA written by Steve Wong, but i cannot find it.

HA is an opinionated area in computing. 2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy - i.e. "you have the backup of the backup, which may be redundant".

upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA:

At a minimum, an OpenShift cluster contains 2 worker nodes in addition to 3 control plane nodes.

https://access.redhat.com/solutions/5034771

personally, i would consider < 3 in k8s as non-HA, but users can make the choice.

/close

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Aug 23, 2023
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

/sig cluster-lifecycle

there was a blog post at k8s.io about HA written by Steve Wong, but i cannot find it.

HA is an opinionated area in computing. 2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy - i.e. "you have the backup of the backup, which may be redundant".

upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA:

At a minimum, an OpenShift cluster contains 2 worker nodes in addition to 3 control plane nodes.

https://access.redhat.com/solutions/5034771

personally, i would consider < 3 in k8s as non-HA, but users can make the choice.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tjanson
Copy link
Contributor Author

tjanson commented Aug 23, 2023

Excuse me for being blunt, but I don't think you've given this issue the consideration it deserves and requires. The key point is the distinction between etcd cluster nodes and Kubernetes control plane nodes (and their effect on HA), which your comment does not address and which you do not seem to have considered.

HA is an opinionated area in computing.

We're specifically discussing the HA requirements of the Kubernetes control plane. That is not a matter of opinion, but fact.

2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy

Again excuse my bluntness, but that's an oversimplified, imprecise portrayal of HA in the context of Kubernetes. It is not sufficient to consider just these broad terms in a discussion of etcd and control plane components.

upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA

Yes, they do so because of a stacked etcd topology. The docs section this issue refers to is about a different topology (external etcd). That exact distinction is the entire point of the issue.

personally, i would consider < 3 in k8s as non-HA, but users can make the choice.

Again, this isn't about your (or anyone else's) personal opinion or recommendation, it is about the technical minimum of K8s control plane components/nodes.

I request that you reopen the issue.

@neolit123
Copy link
Member

neolit123 commented Aug 23, 2023

Excuse me for being blunt, but I don't think you've given this issue the consideration it deserves and requires. The key point is the distinction between etcd cluster nodes and Kubernetes control plane nodes (and their effect on HA), which your comment does not address and which you do not seem to have considered.

my comment is specifically about the external etcd topology. in short, the recommendation of the maintainers is to have 3 cp machines even if etcd is not run on them. if users do not agree with our ideas of HA they can run less or more cp machines.

@tjanson
Copy link
Contributor Author

tjanson commented Aug 23, 2023

I request that you reopen this issue so that a second org member can give their opinion. E.g., @sftim, who was active in the other issue (I'm also fine with reopening the stale #33033 instead of this issue).

@neolit123
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@neolit123: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this [external etcd] topology.

The minimum number of control plane nodes for Kubernetes to work is one. However, the minimum recommended number of etcd is three, because:

  • etcd recommends an odd number of cluster members (for the baseline healthy config)
  • three is the smallest odd number greater than two

So far, so uncontroversial. How about the API server, k-c-m, scheduler, etc?

For the external etcd topology, maybe you can get away with two further nodes, relying on the etcd cluster to support leader election etc. I'm a lead for Docs, not API machinery, so I can't comment authoritatively. However - it sounds plausible.

@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

/retitle Concern about node count for minimal HA control plane with external etcd

@k8s-ci-robot k8s-ci-robot changed the title Content error: Minimal HA control plane with external etcd Concern about node count for minimal HA control plane with external etcd Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

@tjanson would you be happy to see #33033 reopened and this closed as a duplicate?

@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

Ah, I see you would.
/triage duplicate
/close not-planned

@k8s-ci-robot k8s-ci-robot added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Aug 23, 2023
@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2023
@k8s-ci-robot
Copy link
Contributor

@sftim: Closing this issue, marking it as "Not Planned".

In response to this:

Ah, I see you would.
/triage duplicate
/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/duplicate Indicates an issue is a duplicate of other open issue.
Projects
None yet
Development

No branches or pull requests

4 participants