Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replica count wrong(?) in Options for Highly Available Topology page #33033

Open
ejensen-mural opened this issue Apr 19, 2022 · 32 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ejensen-mural
Copy link

ejensen-mural commented Apr 19, 2022

Options for Highly Available Topology states in reference to external etcd topology that "A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology."

However, why the control plane nodes without etcd stacked require at least three hosts instead of two is not entirely clear. While etcd relies on a quorum to elect a leader due to being stateful, kube-controller and kube-scheduler rely on an active-passive model using simple leader election through leasing due to being stateless.

Is this an error in the documentation or is there some reason why two control plane nodes is not sufficient even without etcd stacked?
[see comment from 2023-08-23 for more details]

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 19, 2022
@sftim
Copy link
Contributor

sftim commented Apr 19, 2022

It's a good question. Maybe this advice predates the Lease API and the current mechanisms for leader election.

I'm going to mark this as a bug and let someone who knows the topic well triage it.

/language en
/kind bug

@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language kind/bug Categorizes issue or PR as related to a bug. labels Apr 19, 2022
@sftim
Copy link
Contributor

sftim commented Apr 20, 2022

/retitle Replica count wrong(?) in Options for Highly Available Topology page

@k8s-ci-robot k8s-ci-robot changed the title Options for Highly Available Topology Replica count wrong(?) in Options for Highly Available Topology page Apr 20, 2022
@sftim
Copy link
Contributor

sftim commented Apr 20, 2022

I think the most appropriate SIG is
/sig architecture

(but I might be wrong)

@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Apr 20, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2022
@ejensen-mural
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 16, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2022
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

/reopen
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot reopened this Aug 23, 2023
@k8s-ci-robot
Copy link
Contributor

@sftim: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

/triage accepted

Only two control plane nodes are required

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Aug 23, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

To fix this, change the paragraph:

However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is five (you don't need to put bold text in the documentation).

/help

@k8s-ci-robot
Copy link
Contributor

@sftim:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

To fix this, change the paragraph:

However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is five (you don't need to put bold text in the documentation).

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 23, 2023
@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 23, 2023
@neolit123
Copy link
Member

to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is five (you don't need to put bold text in the documentation).

@sftim FWIW, the 3 cp nodes in the external etcd topology is intentional as i pointed out here:
#42691 (comment)

we actually want users to run 3 apiservers for an HA control plane, and not 2.

@logicalhan
Copy link
Member

we actually want users to run 3 apiservers for an HA control plane, and not 2.

I don't think this is a sufficient response. HA systems are generally characterized as not having SPOF (having redundancy), which would be true for a 2-node control plane with a separate 3-node etcd cluster.

@sftim
Copy link
Contributor

sftim commented Aug 23, 2023

We can clarify minimum vs recommended, especially if these are different. The word “minimum” has a very specific and widely understood meaning.

@logicalhan
Copy link
Member

We can clarify minimum vs recommended, especially if these are different. The word “minimum” has a very specific and widely understood meaning.

Unless there is some data stating why we would preferentially want 3 control-plane nodes instead of two (when the etcd cluster is not co-located), how could we make a recommendation?

@neolit123
Copy link
Member

if one googles HA and the minimum of 3 va 2, the recommendations vary for differentl systems. it is true that the classic is 2. for k8s,
this non-colocated recommendation of 3 cp machines came from discussions with cluster-lifecycle leads at the time (2017?)

also, as i mentioned on #42691 (comment)
there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

so, this proposal for a change here is debatable and i am -1 overall.

@logicalhan
Copy link
Member

there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

Is this really your argument?

@sftim
Copy link
Contributor

sftim commented Aug 24, 2023

Unless there is some data stating why we would preferentially want 3 control-plane nodes instead of two (when the etcd cluster is not co-located), how could we make a recommendation?

Just as an example (please don't take this seriously):

-However, this topology requires twice the number of hosts as the stacked HA topology.
-A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.
+This topology needs more hosts than the stacked HA topology. The minimum number
+of hosts is five (three etcd hosts and two control plane hosts); however, the Kubernetes
+project recommends running at least four control plane hosts and three etcd hosts,
+because we like the number seven.

Anyway, given the 👍 reactions that I saw in Slack from SIG Architecture I'm confident that the true minimum is two (control plane hosts, provided that they have an etcd cluster to talk to). I don't have much of an opinion on the number of control plane hosts or fault isolation zones we should recommend that people operate with.

@neolit123
Copy link
Member

there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

Is this really your argument?

no comment.

i like Joe's point about redundancy on upgrade with 3 cp nodes:
https://kubernetes.slack.com/archives/C5P3FE08M/p1692839159431539?thread_ts=1692809843.558739&cid=C5P3FE08M

and as we can see dufferent k8s maintainers from the same company may have a different opinion.

i'd say, leave this recommended minimum to 3.
perhaps a clarifying note is required on the page, though.
ALA "we are aware HA is situational and a subject of interpretation"

@sftim
Copy link
Contributor

sftim commented Sep 4, 2023

@neolit123 how can we resolve the differences of opinion here?

Most people are saying 5 is the actual minimum to achieve n+1 resilience; you're - I think - saying that you'd prefer not to disclose that detail and instead recommend: 3n etcd hosts and m control plane hosts, where m ≥ 3.

We could reword the page to not mention any minimum (although people will then file requests that we document it).

However, SIG Docs can't arbitrate here. We need the SIGs involved to reach agreement on the technical side.

@neolit123
Copy link
Member

neolit123 commented Sep 4, 2023

i object to reducing the recommended api server count to 2 for the external etcd topology.

related to the 5 vs 3 etcd, that is true and yes the problem can be seen under some conditions. in both cases the admin or controllers needs to act, though. for 3 - potential downtime.

how can we resolve the differences of opinion here?

i cannot resolve that, but 2 notes can be added on top of the doc:

  • first a disclaimer - this is our understanding of HA. SIG CL can be stated as the "we"
  • second a note that under some conditions 5 etcd is much better

@sftim
Copy link
Contributor

sftim commented Sep 4, 2023

i object to reducing the recommended api server count to 2 for the external etcd topology

Just to clarify: that's not what's being proposed. What's proposed is to document the minimum for a cluster to provide resilience (we don't document against what).

There's room to do a lot more here if there are volunteers with capacity to do it.

@sftim
Copy link
Contributor

sftim commented Sep 4, 2023

I'll leave this open; right now we (SIG Docs) aren't prioritising the liaison around achieving a consensus position.

@neolit123
Copy link
Member

Just to clarify: that's not what's being proposed. What's proposed is to document the minimum for a cluster to provide resilience (we don't document against what).

some posts above, 2 was definitely being proposed as the minimum and it is not resilient.

There's room to do a lot more here if there are volunteers with capacity to do it.

i can PR the document as it's maintained by kubeadm maintainers (path is /docs/setup/production-environment/tools/kubeadm/ha-topology/). but i can do that with only the notes mentioned here:
#33033 (comment)

i haven't seen better proposals for content change, yet.

@sftim
Copy link
Contributor

sftim commented Nov 25, 2023

I'd like to document the minimum number of control plane nodes somewhere; ideally not just in the kubeadm docs, because there are other valid ways to deploy a cluster. It's obvious that for non-HA the number is 1, but there's tension about the number for HA.

Can you confirm that there is a plausible and relevant failure mode for a 2-node control plane (backed by a separate 3 node etcd cluster) @neolit123? I'm not convinced that there is, based on my understanding on how much the API server is able to rely on etcd for resilient data persistence and resolution of conflicts.

This is a different question from “does the Kubernetes project recommend three nodes?”. We might well recommend more than the strict minimum for any number of reasons, and to do so is fine. We might even say that for clusters deployed with kubeadm the supported minimum control plane node count is three, stacked etcd or otherwise. Those opinions are valid for the project to hold and publish.

I'm revisiting this issue because people with unusual topologies (eg: scheduler not colocated with API server; API server colocated with etcd but kube-controller-manager and scheduler are separate) are a constituency we don't serve well with our reference docs, and this issue seems to be about serving that specific audience better.

@neolit123
Copy link
Member

for kubeadm perhaps we should keep the ha pages the way they are. for k8s in general there is a lot to talk about failure modes, different topologies and component instances.

on the topic of 2 vs 3 api servers i think my comment is the same as
#33033 (comment)

@sftim
Copy link
Contributor

sftim commented Nov 26, 2023

I'll follow this up with a wider request for primary evidence to support a minimum node count.
(I have seen the “it is not resilient” comment but that is only secondary evidence).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants