Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore etcd with 1 replica and scale it up after kapi deployment for HA shoots #9462

Merged
merged 3 commits into from Mar 21, 2024

Conversation

plkokanov
Copy link
Contributor

@plkokanov plkokanov commented Mar 21, 2024

How to categorize this PR?

/area control-plane-migration
/kind enhancement

What this PR does / why we need it:
With this PR during the restore phase of control plane migration of HA shoot clusters, the main and events etcds will be deployed with only 1 replica (with all other settings required for multi-node etcd) initially. After the etcds become ready, the kube-apiserver is deployed and after the kube-apiserver is ready we scale up the etcd replicas to 3.

Previously the kube-apiserver was deployed only after:

  1. The main etcd was created with 1 replica and restored
  2. The main etcd was scaled to 3 replicas and they became ready
  3. All 3 replicas of the events etcd became ready
    This would lead to a 3 minute increase in downtime for HA shoot clusters compared to non-HA shoot clusters during control plane migration.

After talking with @ishan16696 it turns out that we do not have to wait for all 3 replicas of the etcds to be up and running before deploying the kube-apiserver. So with this approach the extra 3 minutes of downtime are no-longer present. Note that now we also only wait for 1 replica of the events etcd to become ready before deploying the kube-apiserver, instead of waiting for all 3.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
/cc @ishan16696
/cc @shafeeqes if we could get this in the v1.91.0

Release note:

During the `restore` phase of control plane migration of HA shoots, the shoot's `kube-apiserver` is deployed immediately after one replica is ready for each of the events and main `etcd`s. The event and main `etcd`s are scaled up to 3 replicas (the current default for HA shoots) after the `kube-apiserver` is deployed and ready. This should greatly reduce the downtime during control plane migration of HA shoots.

@gardener-prow gardener-prow bot requested a review from ishan16696 March 21, 2024 10:21
@gardener-prow gardener-prow bot added area/control-plane-migration Control plane migration related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 21, 2024
@shafeeqes
Copy link
Contributor

/cherry-pick release v1.91

@gardener-ci-robot
Copy link
Contributor

@shafeeqes: once the present PR merges, I will cherry-pick it on top of release v1.91 in a new PR and assign it to you.

In response to this:

/cherry-pick release v1.91

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@ishan16696 ishan16696 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2024
Copy link
Contributor

gardener-prow bot commented Mar 21, 2024

LGTM label has been added.

Git tree hash: 6c6bd35f2e007d0d3311321e72497483b9b82c04

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2024
@gardener-prow gardener-prow bot requested a review from ishan16696 March 21, 2024 11:49
@gardener-prow gardener-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 21, 2024
@gardener-prow gardener-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 21, 2024
@shafeeqes
Copy link
Contributor

Thanks!
/approve

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2024
@shafeeqes
Copy link
Contributor

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2024
Copy link
Contributor

gardener-prow bot commented Mar 21, 2024

LGTM label has been added.

Git tree hash: e18bf6a1ae3dab34c74d8d619cc0543f2c544a59

@gardener-prow gardener-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 21, 2024
@shafeeqes
Copy link
Contributor

/hold

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2024
@plkokanov
Copy link
Contributor Author

/test pull-gardener-e2e-kind

@shafeeqes
Copy link
Contributor

/unhold

@gardener-prow gardener-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2024
@plkokanov
Copy link
Contributor Author

/test pull-gardener-e2e-kind-ha-single-zone

Copy link
Member

@ishan16696 ishan16696 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

Copy link
Contributor

gardener-prow bot commented Mar 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ishan16696, shafeeqes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@plkokanov
Copy link
Contributor Author

/test pull-gardener-e2e-kind-ha-multi-zone

@gardener-prow gardener-prow bot merged commit d0f47f1 into gardener:master Mar 21, 2024
17 checks passed
@gardener-ci-robot
Copy link
Contributor

@shafeeqes: cannot checkout release v1.91: error checking out "release v1.91": exit status 1 error: pathspec 'release v1.91' did not match any file(s) known to git

In response to this:

/cherry-pick release v1.91

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-migration Control plane migration related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants