Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEP-20] Zone pinning for shoots on multi-zonal seeds 📌 #6579

Merged
merged 8 commits into from
Sep 7, 2022

Conversation

timuthy
Copy link
Contributor

@timuthy timuthy commented Aug 25, 2022

How to categorize this PR?

/area control-plane
/area high-availability
/kind enhancement

What this PR does / why we need it:
This PR adds the zone pinning requirement discussed by GEP-20, ref.

In essence it does the following:

  • Gardenlet labels namespaces which belong to non-HA or single-zonal shoots if it's responsible for a multi-zonal seed (seed which has worker across availability zones).
  • A pod-zone-affinity webhook in GRM adds a zone-aware affinity term for such control-plane pods so that those will be scheduled into a single zone only.
  • After etcd was deployed successfully, it's assumed that at least one stateful pod was scheduled. Gardenlet then extracts the zone information from that pod/node.
  • From that moment on, pods' affinity configurations is enhanced by a nodeAffinity to "pin" pods to a specific zone. It is especially important to schedule pods to the original zone when shoots are being woken up from hibernation.

Which issue(s) this PR fixes:
Fixes partially #6529

Release note:

Gardener has been being prepared for more shoot HA use-cases and thus some assumption about currently running landscapes are required: If you use a `multi-zonal` labelled seed and scheduled non-HA shoots onto it, this release of Gardener will potentially cause scheduling conflicts to the control-plane pods as it will try to locate all pods into a single zone only. Pods that can't be re-scheduled (mainly because of volume dependencies) will remain in `Pending` state.
Gardener is prepared to run non-HA and single-zonal shoots on multi-zonal seeds. In such a setup, control-plane pods of the mentioned shoots are scheduled into a single availability zone only to avoid any extra cross zonal traffic that would usually involve higher latency and cost. **PLEASE NOTE**: The `StorageClass` in seeds used for control-plane components must have `volumeBindingMode: WaitForFirstConsumer` to let the zone-pinning work properly.

@gardener-prow gardener-prow bot added area/control-plane Control plane related area/high-availability High availability related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Aug 25, 2022
@gardener-prow gardener-prow bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 25, 2022
@timuthy
Copy link
Contributor Author

timuthy commented Aug 26, 2022

/hold
I plan to improve the affinity terms after having a second look. With the current approach we rely on the fact, that any pod with a volume is created and scheduled first. This becomes especially important when a shoot is woken up from hibernation. If a stateless pod is scheduled first to any zone, then the remaining stateful pods are supposed to be scheduled to exactly this zone. Unfortunately, there is no guarantee that this stateless pod will run in very same zone that was used before the cluster was hibernated.

This PR will be transferred into draft mode as long as I'm working on the improvement.

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2022
@timuthy timuthy marked this pull request as draft August 26, 2022 06:36
@gardener-prow gardener-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 26, 2022
@gardener-prow gardener-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 30, 2022
@timuthy timuthy marked this pull request as ready for review August 30, 2022 12:40
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2022
@timuthy
Copy link
Contributor Author

timuthy commented Aug 30, 2022

Update: The PR is ready for review now.

@timuthy timuthy changed the title [GEP-20] Zone pinning for shoots on multi-zonal seeds [GEP-20] Zone pinning for shoots on multi-zonal seeds 📌 Aug 30, 2022
@timuthy timuthy removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2022
@oliver-goetz
Copy link
Member

/assign

@rfranzke
Copy link
Member

rfranzke commented Sep 2, 2022

/assign

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some small comments and have two general questions where I don't know enough about the context to answer them by myself 😄

When I see a shoot starting up etcd-main and etcd-events are starting at the same time on different nodes.
a) When there is no zone pinning annotation yet, couldn't it happen, that these etcd nodes would be in located in different zones? If this could happen, their volumes would be in different zones and couldn't be pinned to the same zone anymore right?
b) How about the migration scenario when introducing zone pinning. There are other control-plane components like loki and prometheus which use PVCs. Could it be possible that their volumes are located in different zones and could not be attached anymore when zone pinning was activated?

pkg/operation/botanist/namespaces.go Outdated Show resolved Hide resolved
pkg/resourcemanager/webhook/podzoneaffinity/handler.go Outdated Show resolved Hide resolved
pkg/apis/core/v1beta1/constants/types_constants.go Outdated Show resolved Hide resolved
pkg/operation/botanist/namespaces.go Outdated Show resolved Hide resolved
pkg/operation/botanist/namespaces.go Outdated Show resolved Hide resolved
docs/concepts/resource-manager.md Outdated Show resolved Hide resolved
pkg/resourcemanager/webhook/podzoneaffinity/handler.go Outdated Show resolved Hide resolved
pkg/resourcemanager/webhook/podzoneaffinity/handler.go Outdated Show resolved Hide resolved
pkg/resourcemanager/webhook/podzoneaffinity/handler.go Outdated Show resolved Hide resolved
After etcd is ready we assume that at least one pod with a volume has been scheduled. This is the earliest point in time when we can "pin" the zone, i.e. all remaining volumes also reside in that zone and for the future it needs to be assured that all pods (esp. the ones with volumes) are scheduled there.
For non-HA or single-zonal HA shoot clusters, control-plane pods should be scheduled into a single zone only to avoid any cross zonal traffic.
@timuthy
Copy link
Contributor Author

timuthy commented Sep 5, 2022

Thanks for your feedback @rfranzke and @oliver-goetz. I addressed all your comments, open is still #6579 (comment). PTAL.

@rfranzke
Copy link
Member

rfranzke commented Sep 6, 2022

Gardener has been being prepared for more shoot HA use-cases and thus some assumption about currently running landscapes are required: If you use a `multi-zonal` labelled seed and scheduled non-HA shoots onto it, this release of Gardener will potentially cause scheduling conflicts to the control-plane pods as it will try to locate all pods into a single zone only. Pods that can't be re-scheduled (mainly because of volume dependencies) will remain in `Pending` state.

Just to double-check: This is likely no problem because the "seed.gardener.cloud/multi-zonal" label must be added manually to the Seeds, right? I can imagine that there are folks in the wild that have seeds with multiple AZs, however I believe that nobody has added this label yet given that it's quite new and the whole HA story is still alpha/in development. In this case, I'm fine with it.

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @timuthy!

@timuthy
Copy link
Contributor Author

timuthy commented Sep 6, 2022

@rfranzke

Just to double-check: This is likely no problem because the "seed.gardener.cloud/multi-zonal" label must be added manually to the Seeds, right?

Yes, that's mostly right. We planned to enable automatic labeling for ManagedSeeds because there we have information about workers.

@timuthy
Copy link
Contributor Author

timuthy commented Sep 6, 2022

Thanks @rfranzke. I incorporated your feedback and in addition found some spots where I could improve the test handling regarding your suggestions: 56a86cf, b4a4869

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool @timuthy, thanks for also cleaning up the other webhook test suites!

@rfranzke
Copy link
Member

rfranzke commented Sep 7, 2022

/approve

@oliver-goetz Do you also want to have another look?

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 7, 2022
Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one small thing left.

Comment on lines 82 to 85
delete(namespace.Labels, v1beta1constants.ShootControlPlaneEnforceZone)
if zonePinningRequired(b.Shoot.GetInfo(), b.Seed.GetInfo()) && !metav1.HasLabel(namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone) {
metav1.SetMetaDataLabel(&namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone, "")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delete(namespace.Labels, v1beta1constants.ShootControlPlaneEnforceZone)
if zonePinningRequired(b.Shoot.GetInfo(), b.Seed.GetInfo()) && !metav1.HasLabel(namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone) {
metav1.SetMetaDataLabel(&namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone, "")
}
if !zonePinningRequired(b.Shoot.GetInfo(), b.Seed.GetInfo()) {
delete(namespace.Labels, v1beta1constants.ShootControlPlaneEnforceZone)
} else if !metav1.HasLabel(namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone) {
metav1.SetMetaDataLabel(&namespace.ObjectMeta, v1beta1constants.ShootControlPlaneEnforceZone, "")
}

Otherwise, the label would be changed back and forth during reconciliation, if enforce zone is activated.
This function would set its value to "", but AddZoneInformationToSeedNamespace would add a zone value again later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @oliver-goetz. I fixed it, PTAL.

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

🚀

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Sep 7, 2022
@gardener-prow
Copy link
Contributor

gardener-prow bot commented Sep 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliver-goetz, rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [oliver-goetz,rfranzke]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane Control plane related area/high-availability High availability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants