Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new feature gate VPAForETCD to use VPA for etcd autoscaling #8984

Merged

Conversation

voelzmo
Copy link
Member

@voelzmo voelzmo commented Dec 20, 2023

How to categorize this PR?

/area auto-scaling
/area control-plane
/kind enhancement

What this PR does / why we need it:
The main reason why etcd currently uses HVPA is for limiting when VPA recommendations should be applied. The two use cases are preventing downscaling entirely, or limiting downscaling to the Shoot's maintenance window. Additionally, there are some "minimum change sizes" defined on the HVPA object, meaning that a change in resource requests smaller than the minimum change size will not be applied.
This PR introduces a method to achieve a similar functionality using plain VPA, making it possible to remove HVPA for etcd entirely in a next step.

In this PR we update to VPA version 1.1.1, which contains a new upstream feature, called EvictionRequirements, that allows us to limit evictions based on the scaling direction (is the workload being scaled up or being scaled down). Detailed information for this feature can be found in the original enhancement proposal.

In detail, this PR

  • Introduces a new featuregate VPAForETCD on gardenlet and garden-operator
  • Introduces a new controller in gardenlet, called VPAEvictionRequirementsController to manage the EvictionRequirements on VPA objects based on two new labels and a new annotation
  • takes care that the garden-operator starts the controller instead of gardenlet, if necessary

Introduces the following changes in case operators have enabled the new featuregate for the gardenlet on their Seeds:

  • Use plain VPA to scale etcd
  • Label the VPA object with
    • Label autoscaling.gardener.cloud/eviction-requirements: managed-by-controller to indicate that the EvictionRequirements on this VPA object will be managed by the new controller
  • Annotate the VPA with eviction-requirements.autoscaling.gardener.cloud/downscale-restriction: in-maintenance-window-only in case of limiting scaling down to a Shoot's maintenance window
  • Annotate the VPA with eviction-requirements.autoscaling.gardener.cloud/downscale-restriction: never in case of denying scaling down entirely
    • These annotations are currently assigned in the same way as for HVPA, based on the ShootClass
    • Add the annotation shoot.gardener.cloud/maintenance-window to make the Shoot maintenance window accessible for the new controller
  • Drop the "minimum change size" requirement, as it lead to severe deadlocks in the past in which human interaction was necessary to get etcd out of a broken state

If HVPA is enabled, the VPAForETCD feature gate takes precedence, so the etcds will be scaled like described above with plain VPA and the corresponding HVPA object will be removed.

Release note:

Updated VPA to 1.1.1
A new feature gate named `VPAForETCD` is now introduced for gardenlet and gardener-operator. When enabled, VPA for etcd is used, regardless of the HVPA feature gate setting. The new VPA limits scaling down to a Shoot's maintenance window or even entirely based on the `ShootClass` in the same way as it is currently done for HVPA.

@gardener-prow gardener-prow bot added the area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related label Dec 20, 2023
@gardener-prow gardener-prow bot requested a review from acumino December 20, 2023 20:38
@gardener-prow gardener-prow bot added the area/control-plane Control plane related label Dec 20, 2023
@gardener-prow gardener-prow bot added kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 20, 2023
@voelzmo
Copy link
Member Author

voelzmo commented Dec 20, 2023

/hold as this uses an unreleased VPA version

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 20, 2023
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from c70837a to 5bc931c Compare December 20, 2023 21:28
Copy link
Contributor

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very interesting improvement.

I have a few questions/comments.

pkg/component/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd_test.go Outdated Show resolved Hide resolved
pkg/gardenlet/controller/vpaevictionrequirements/add.go Outdated Show resolved Hide resolved
Copy link
Contributor

@andrerun andrerun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one minor fix required, one idea which may or may not be worth implementing, and one question.

go.mod Outdated Show resolved Hide resolved
pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/component/etcd/etcd_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@andrerun andrerun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gardener-prow gardener-prow bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Jan 9, 2024
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from 4cdfec7 to 9ad9cfe Compare January 11, 2024 09:17
@voelzmo
Copy link
Member Author

voelzmo commented Jan 11, 2024

Also adapted the PR description to match the changed labels.

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2024
Copy link
Contributor

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Then we only need to get the VPA release in.

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2024
@rfranzke
Copy link
Member

/assign

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, and sorry for joining the review late (I was on leave). Apart from the above feedback, please also add

pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/gardenlet/apis/config/v1alpha1/types.go Show resolved Hide resolved
pkg/gardenlet/apis/config/v1alpha1/types.go Show resolved Hide resolved
pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/gardenlet/controller/vpaevictionrequirements/add.go Outdated Show resolved Hide resolved
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2024
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from baa8120 to 5a523a6 Compare April 26, 2024 09:50
@gardener-prow gardener-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2024
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from 5a523a6 to b6a25b8 Compare April 26, 2024 10:00
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from b6a25b8 to e228f21 Compare April 26, 2024 10:58
@voelzmo
Copy link
Member Author

voelzmo commented Apr 26, 2024

Updated to the new VPA release 1.1.1, which fixes the NPE and rebased. Also included changes for the license headers in the new files added by this commit. This is good to go from my side!

pkg/features/features.go Outdated Show resolved Hide resolved
docs/deployment/feature_gates.md Outdated Show resolved Hide resolved
@voelzmo voelzmo force-pushed the enh/eviction-requirements-for-etcd branch from e228f21 to c6711cb Compare April 26, 2024 11:49
@voelzmo
Copy link
Member Author

voelzmo commented Apr 26, 2024

/retest

@rfranzke
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2024
Copy link
Contributor

gardener-prow bot commented Apr 26, 2024

LGTM label has been added.

Git tree hash: 175ab6b00bdbdd48251a68a19ddb10ac2fc8fa4f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related area/control-plane Control plane related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants