Skip to content

Latest commit

 

History

History
344 lines (252 loc) · 27.2 KB

13-automated-seed-management.md

File metadata and controls

344 lines (252 loc) · 27.2 KB

Automated Seed Management

Automated seed management involves automating certain aspects of managing seeds in Garden clusters, such as:

Implementing the above features would involve changes to various existing Gardener components, as well as perhaps introducing new ones. This document describes these features in more detail and proposes a design approach for some of them.

In Gardener, scheduling shoots onto seeds is quite similar to scheduling pods onto nodes in Kubernetes. Therefore, a guiding principle behind the proposed design approaches is taking advantage of best practices and existing components already used in Kubernetes.

Ensuring Seeds Capacity for Shoots Is Not Exceeded

Seeds have a practical limit of how many shoots they can accommodate. Exceeding this limit is undesirable as the system performance will be noticeably impacted. Therefore, it is important to ensure that a seed's capacity for shoots is not exceeded by introducing a maximum number of shoots that can be scheduled onto a seed and making sure that it is taken into account by the scheduler.

An initial discussion of this topic is available in Issue #2938. The proposed solution is based on the following flow:

  • The gardenlet is configured with certain resources and their total capacity (and, for certain resources, the amount reserved for Gardener).
  • The gardenlet seed controller updates the Seed status with the capacity of each resource and how much of it is actually available to be consumed by shoots, using capacity and allocatable fields that are very similar to the corresponding fields in the Node status.
  • When scheduling shoots, gardener-scheduler is influenced by the remaining capacity of the seed. In the simplest possible implementation, it never schedules shoots onto a seed that has already reached its capacity for a resource needed by the shoot.

Initially, the only resource considered would be the maximum number of shoots that can be scheduled onto a seed. Later, more resources could be added to make more precise scheduling calculations.

Note: Resources could also be requested by shoots, similarly to how pods can request node resources, and the scheduler could then ensure that such requests are taken into account when scheduling shoots onto seeds. However, the user is rarely, if at all, concerned with what resources does a shoot consume from a seed, and this should also be regarded as an implementation detail that could change in the future. Therefore, such resource requests are not included in this GEP.

In addition, an extensibility plugin framework could be introduced in the future in order to advertise custom resources, including provider-specific resources, so that gardenlet would be able to update the seed status with their capacity and allocatable values, for example load balancers on Azure. Such a concept is not described here in further details as it is sufficiently complex to require a separate GEP.

Example Seed status with capacity and allocatable fields:

status:
  capacity:
    shoots: "100"
    persistent-volumes: "200" # Built-in resource
    azure.provider.extensions.gardener.cloud/load-balancers: "30" # Custom resource advertised by an Azure-specific plugin
  allocatable:
    shoots: "100"
    persistent-volumes: "197" # 3 persistent volumes are reserved for Gardener
    azure.provider.extensions.gardener.cloud/load-balancers: "300"

Gardenlet Configuration

As mentioned above, the total resource capacity for built-in resources such as the number of shoots is specified as part of the gardenlet configuration, not in the Seed spec. The gardenlet configuration itself could be specified in the spec of the newly introduced ManagedSeed resource. Here it is assumed that in the future this could become the recommended and most widely used way to manage seeds. If the same gardenlet is responsible for multiple seeds, they would all share the same capacity settings.

To specify the total resource capacity for built-in resources, as well as the amount of such resources reserved for Gardener, the 2 new fields resources.capacity and resources.reserved are introduced in the GardenletConfiguration resource. The gardenlet seed controller would then initialize the capacity and allocatable fields in the seed status as follows:

  • The capacity value is set to the configured resources.capacity.
  • The allocatable value is set to the configured resources.capacity minus resources.reserved.

Example GardenletConfiguration with resources.capacity and resources.reserved field:

resources:
  capacity:
    shoots: 100
    persistent-volumes: 200
  reserved:
    persistent-volumes: 3

Scheduling Algorithm

Currently gardener-scheduler uses a simple non-extensible algorithm in order to schedule shoots onto seeds. It goes through the following stages:

  • Filter out seeds that don't meet scheduling requirements such as being ready, matching cloud profile and shoot label selectors, matching the shoot provider, and not having taints that are not tolerated by the shoot.
  • From the remaining seeds, determine candidates that are considered best based on their region, by using a strategy that can be either "same region" or "minimal distance".
  • Among these candidates, choose the one with the least number of shoots.

This scheduling algorithm should be adapted in order to properly take into account resources capacity and requests. As a first step, during the filtering stage, any seeds that would exceed their capacity for shoots, or their capacity for any resources requested by the shoot, should simply be filtered out and not considered during the next stages.

Later, the scheduling algorithm could be further enhanced by replacing the step in which the region strategy is applied by a scoring step similar to the one in Kubernetes Scheduler. In this scoring step, the scheduler would rank the remaining seeds to choose the most suitable shoot placement. It would assign a score to each seed that survived filtering based on a list of scoring rules. These rules might include for example MinimalDistance and SeedResourcesLeastAllocated, among others. Each rule would produce its own score for the seed, and the overall seed score would be calculated as a weighted sum of all such scores. Finally, the scheduler would assign the shoot to the seed with the highest ranking.

ManagedSeeds

When all or most of the existing seeds are near capacity, new seeds should be created in order to accommodate more shoots. Conversely, sometimes there could be too many seeds for the number of shoots, and so some of the seeds could be deleted to save resources. Currently, the process of creating a new seed involves a number of manual steps, such as creating a new shoot that meets certain criteria, and then registering it as a seed in Gardener. This could be automated to some extent by annotating a shoot with the use-as-seed annotation, in order to create a "shooted seed". However, adding more than one similar seeds still requires manually creating all needed shoots, annotating them appropriately, and making sure that they are successfully reconciled and registered.

To create, delete, and update seeds effectively in a declarative way and allow auto-scaling, a "creatable seed" resource along with a "set" (and in the future, perhaps also a "deployment") of such creatable seeds should be introduced, similar to Kubernetes Pod, ReplicaSet, and Deployment (or to MCM Machine, MachineSet, and MachineDeployment) resources. With such resources (and their respective controllers), creating a new seed based on a template would become as simple as increasing the replicas field in the "set" resource.

In Issue #2181 it is already proposed that the use-as-seed annotation is replaced by a dedicated ShootedSeed resource. The solution proposed here further elaborates on this idea.

ManagedSeed Resource

The ManagedSeed resource is a dedicated custom resource that represents an evolution of the "shooted seed" and properly replaces the use-as-seed annotation. This resource contains:

  • The name of the Shoot that should be registered as a Seed.
  • An optional seed section that contains the Seed spec and parts of the metadata, such as labels and annotations.
  • An optional gardenlet section that contains:
    • Certain aspects of the gardenlet deployment configuration, such as the number of replicas, the image, which bootstrap mechanism to use (bootstrap token / service account), etc.
    • The GardenletConfiguration resource that contains controllers configuration, feature gates, etc.

Either the seed or the gardenlet section must be specified, but not both:

  • If the seed section is specified, gardenlet should not be deployed in the shoot, and a new seed should be registered based on the seed section.
  • If the gardenlet section is specified, gardenlet should be deployed in the shoot, and it will register a new seed upon startup based on the seedConfig section of the GardenletConfiguration resource.

A ManagedSeed allows fine-tuning the seed and the gardenlet configuration of shooted seeds in order to deviate from the global defaults, e.g. lower the concurrent sync for some of the seed's controllers or enable a feature gate only on certain seeds. Also, it simplifies the deletion protection of such seeds.

Also, the ManagedSeed resource is a more powerful alternative to the use-as-seed annotation. The implementation of the use-as-seed annotation itself could be refactored to use a ManagedSeed resource extracted from the annotation by a controller.

Although in this proposal a ManagedSeed is always a "shooted seed", that is a Shoot that is registered as a Seed, this idea could be further extended in the future by adding a type field that could be either Shoot (implied in this proposal), or something different. Such an extension would allow to register and manage as Seed a cluster that is not a Shoot, e.g. a GKE cluster.

Last but not least, ManagedSeeds could be used as the basis for creating and deleting seeds automatically via the ManagedSeedSet resource that is described in ManagedSeedSets.

Unlike the Seed resource, the ManagedSeed resource is namespaced. If created in the garden namespace, the resulting seed is globally available. If created in a project namespace, the resulting seed can be used as a "private seed" by shoots in the project, either by being decorated with project-specific taints and labels, or by being of the special PrivateSeed kind that is also namespaced. The concept of private seeds / cloudprofiles is described in Issue #2874. Until this concept is implemented, ManagedSeed resources might need to be restricted to the garden namespace, similarly to how shoots with the use-as-seed annotation currently are.

Example ManagedSeed resource with a seed section:

apiVersion: seedmanagement.gardener.cloud/v1alpha1
kind: ManagedSeed
metadata:
  name: crazy-botany
  namespace: garden
spec:
  shoot:
    name: crazy-botany # Shoot that should be registered as a Seed
  seed: # Seed template, including spec and parts of the metadata
    metadata:
      labels:
        foo: bar
    spec:
      provider:
        type: gcp
        region: europe-west1
      taints:
      - key: seed.gardener.cloud/protected
      ...

Example ManagedSeed resource with a gardenlet section:

apiVersion: seedmanagement.gardener.cloud/v1alpha1
kind: ManagedSeed
metadata:
  name: crazy-botany
  namespace: garden
spec:
  shoot:
    name: crazy-botany # Shoot that should be registered as a Seed
  gardenlet: 
    deployment: # Gardenlet deployment configuration
      replicas: 1
      revisionHistoryLimit: 10
      serviceAccountName: gardenlet
      image:
        repository: eu.gcr.io/gardener-project/gardener/gardenlet
        tag: latest
        pullPolicy: IfNotPresent
      resources:
        ...
      annotations: 
        ...
      labels:
        ...
      env:
        ...
      imageVectorOverwrite: | # See docs/deployment/image_vector.md
        ...
      componentImageVectorOverwrites: | # See docs/deployment/image_vector.md
        ...
      dnsConfig:
        ...
      additionalVolumes:
        ...
      additionalVolumeMounts:
        ...
      vpa: false
    config: # GardenletConfiguration resource
      seedConfig: # Configuration of the Seed to be registered
        ...
      controllers:
        shoot:
          concurrentSyncs: 20
      featureGates:
        CachedRuntimeClients: true
      ...

ManagedSeed Controller

ManagedSeeds are reconciled by a new managed seed controller in gardenlet. Its implementation is very similar to the current seed registration controller, and in fact could be regarded as a refactoring of the latter, with the difference that it uses the ManagedSeed resource rather than the use-as-seed annotation on a Shoot. The gardenlet only reconciles ManagedSeeds that refer to Shoots scheduled on Seeds the gardenlet is responsible for.

Once this controller is considered sufficiently stable, the current use-as-seed annotation and the controller mentioned above should be marked as deprecated and eventually removed.

A ManagedSeed that is in use by shoots cannot be deleted, unless the shoots are either deleted or moved to other seeds first. The managed seed controller ensures that this is the case.

Provider-specific Seed Bootstrapping Actions

Bootstrapping a new seed might require additional provider-specific actions to the ones performed automatically by the managed seed controller. For example, on Azure this might include getting a new subscription, extending quotas, etc. This could eventually be automated by introducing an extension mechanism for the Gardener seed bootstrapping flow, to be handled by a new type of controller in the provider extensions. However, such an extension mechanism is not in the scope of this proposal and might require a separate GEP.

One idea that could be further explored is the use shoot readiness gates, similar to Kubernetes pod readiness gates, in order to control whether a Shoot is considered Ready before it could be registered as a Seed. A provider-specific extension could set the special condition that is specified as a readiness gate to True only after it has successfully performed the provider-specific actions needed.

Changes to Existing Controllers

Since the Shoot registration as a Seed is decoupled from the Shoot reconciliation, existing gardenlet controllers would not have to be changed in order to properly support ManagedSeeds. The only change to gardenlet that would be needed is introducing the new managed seed controller mentioned above, and possibly retiring the old one at some point. Also, the Shoot controller might need to be adapted to ensure that Shoots registered as Seeds cannot be deleted until they ManagedSeed that caused the registration is itself deleted.

The introduction of the ManagedSeed resource would also require no changes to existing gardener-controller-manager controllers that operate on Shoots (for example, shoot hibernation and maintenance controllers).

ManagedSeedSets

Similarly to a ReplicaSet, the purpose of a ManagedSeedSet is to maintain a stable set of replica ManagedSeeds available at any given time. As such, it is used to guarantee the availability of a specified number of identical ManagedSeeds, on an equal number of identical Shoots.

ManagedSeedSet Resource

The ManagedSeedSet resource has a selector field that specifies how to identify ManagedSeeds it can acquire, a number of replicas indicating how many ManagedSeeds (and their corresponding Shoots) it should be maintaining, and a two templates:

  • A ManagedSeed template (template) specifying the data of new ManagedSeeds it should create to meet the number of replicas criteria.
  • A Shoot template (shootTemplate) specifying the data of new Shoots it should create to host the ManagedSeeds.

A ManagedSeedSet then fulfills its purpose by creating and deleting ManagedSeeds (and their corresponding Shoots) as needed to reach the desired number.

A ManagedSeedSet is linked to its ManagedSeeds and Shoots via the metadata.ownerReferences field, which specifies what resource the current object is owned by. All ManagedSeeds and Shoots acquired by a ManagedSeedSet have their owning ManagedSeedSet's identifying information within their ownerReferences field.

Example ManagedSeedSet resource:

apiVersion: seedmanagement.gardener.cloud/v1alpha1
kind: ManagedSeedSet
metadata:
  name: crazy-botany
  namespace: garden
spec:
  replicas: 1
  selector:
    matchLabels:
      foo: bar
  template: # ManagedSeed template, including spec and parts of the metadata
    metadata:
      labels:
        foo: bar
    spec: 
      # shoot.name is not specified since it's filled automatically by the controller
      seed: # Either a seed or a gardenlet section must be specified, see above
        metadata:
          labels:
            foo: bar
        provider:
          type: gcp
          region: europe-west1
        taints:
        - key: seed.gardener.cloud/protected
        ...
  shootTemplate: # Shoot template, including spec and parts of the metadata
    metadata:
      labels:
        foo: bar
    spec:
      cloudProfileName: gcp
      secretBindingName: shoot-operator-gcp
      region: europe-west1
      provider:
        type: gcp
      ...

ManagedSeedSet Controller

ManagedSeedSets are reconciled by a new managed seed set controller in gardener-controller-manager. During the reconciliation this controller creates and deletes ManagedSeeds and Shoots in response to changes to the replicas and selector fields.

Note: The introduction of the ManagedSeedSet resource would not require any changes to gardenlet or to existing gardener-controller-manager controllers.

ManagedSeedDeployments

A ManagedSeedSet, similarly to a ReplicaSet, does not manage updates to its replicas in any way. In the future, we might introduce ManagedSeedDeployments, a higher-level concept that manages ManagedSeedSets and provides declarative updates to ManagedSeeds along with other useful features, similarly to a Deployment, or perhaps a StatefulSet. An implementation that is more like a StatefulSet rather than a Deployment would be rolling seed by seed, stopping on rollout-failures (see Issue #87), etc.

There is an important difference between seeds and pods or nodes in that seeds are more "heavyweight" and therefore updating a set of seeds by introducing new seeds and moving shoots to them tends to be much more complex, time-consuming, and prone to failures compared to updating the seeds "in place". Furthermore, updating seeds in this way depends on a mature implementation of GEP-7: Shoot Control Plane Migration, which is not available right now. Therefore, ManagedSeedDeployments are not described in detail here, as properly dealing with the challenges mentioned requires a separate GEP.

Scaling-in ManagedSeedSets

Deleting ManagedSeeds in response to decreasing the replicas of a ManagedSeedSet deserves special attention for two reasons:

  • A seed that is already in use by shoots cannot be deleted, unless the shoots are either deleted or moved to other seeds first.
  • When there are more empty seeds than requested for deletion, determining which seeds to delete might not be as straightforward as with pods or nodes, if the seeds are based on different versions of the template and therefore not completely identical.

The above challenges could be addressed as follows:

  • In order to scale in a ManagedSeedSet successfully, there should be at least as many empty ManagedSeeds as the difference between the old and the new replicas. In some cases, the user might need to ensure that this is the case by draining some seeds manually before decreasing the replicas field.
  • It should be possible to protect ManagedSeeds from deletion even if they are empty, perhaps via an annotation such as seedmanagement.gardener.cloud/protect-from-deletion. Such seeds are not taken into account when determining whether the scale in operation can succeed.
  • The decision which seeds to delete among the ManagedSeeds that are empty and not protected should be based on hints, perhaps again in the form of annotations, that could be added manually by the user, as well as other factors, see Prioritizing ManagedSeed Deletion.

Prioritizing ManagedSeed Deletion

To help the controller decide which empty ManagedSeeds are to be deleted first, the user could manually annotate ManagedSeeds with a seed priority annotation such as seedmanagement.gardener.cloud/priority. ManagedSeeds with lower priority are more likely to be deleted first. If not specified, a certain default value is assumed, for example 3.

Besides this annotation, the controller should take into account also other factors, such as the current seed conditions (NotReady should be preferred for deletion over Ready), as well as its age (older should be preferred for deletion over newer).

Auto-scaling Seeds

The most interesting and advanced automated seed management feature is making sure that a Garden cluster has enough seeds registered to schedule new shoots (and, in the future, reschedule shoots from drained seeds) without exceeding the seeds capacity for shoots, but not more than actually needed at any given moment. This would involve introducing an auto-scaling mechanism for seeds in Garden clusters.

The proposed solution builds upon the ideas introduced earlier. The ManagedSeedSet resource (and in the future, also the ManagedSeedDeployment resource) could have a scale subresource that changes the replicas field. This would allow a new "seed autoscaler" controller to scale these resources via a special "autoscaler" resource (for example SeedAutoscaler), similarly to how the Kubernetes Horizontal Pod Autoscaler controller scales pods, as described in Horizontal Pod Autoscaler Walkthrough.

The primary metric used for scaling should be the number of shoots already scheduled onto that seed either as a direct value or as a percentage of the seed's capacity for shoots introduced in Ensuring Seeds Capacity for Shoots Is Not Exceeded (utilization). Later, custom metrics based on other resources, including provider-specific resources, could be considered as well.

Note: Even if the controller is called Horizontal Pod Autoscaler, it is capable of scaling any resource with a scale subresource, using any custom metric. Therefore, initially it was proposed to use this controller directly. However, a number of important drawbacks were identified with this approach, and so it is no longer proposed here.

SeedAutoscaler Resource

The SeedAutoscaler automatically scales the number of ManagedSeeds in a ManagedSeedSet based on observed resource utilization. The resource could be any resource that is tracked via the capacity and allocatable fields in the Seed status, including in particular the number of shoots already scheduled onto the seed.

The SeedAutoscaler is implemented as a custom resource and a new controller. The resource determines the behavior of the controller. The SeedAutoscaler resource has a scaleTargetRef that specifies the target resource to be scaled, the minimum and maximum number of replicas, as well as a list of metrics. The only supported metric type initially is Resource for resources that are tracked via the capacity and allocatable fields in the Seed status. The resource target can be of type Utilization or AverageValue.

Example SeedAutoscaler resource:

apiVersion: seedmanagement.gardener.cloud/v1alpha1
kind: SeedAutoscaler
metadata:
  name: crazy-botany
  namespace: garden
spec:
  scaleTargetRef:
    apiVersion: seedmanagement.gardener.cloud/v1alpha1
    kind: ManagedSeedSet
    name: crazy-botany
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource # Only Resource is supported
    resource:
      name: shoots
      target:
        type: Utilization # Utilization or AverageValue
        averageUtilization: 50

SeedAutoscaler Controller

SeedAutoscaler resources are reconciled by a new seed autoscaler controller, either in gardener-controller-manager or out-of-tree, similarly to cluster-autoscaler. The controller periodically adjusts the number of replicas in a ManagedSeedSet to match the observed average resource utilization to the target specified by user.

Note: The SeedAutoscaler controller should perhaps not be limited to evaluating only metrics, it could also take into account also taints, label selectors, etc. This is not yet reflected in the example SeedAutoscaler resource above. Such details are intentionally not specified in this GEP, they should be further explored in the issues created to track the actual implementation.

Evaluating Metrics for Autoscaling

The metrics used by the controller, for example the shoots metric above, could be evaluated in one of the following ways:

  • Directly, by looking at the capacity and allocatable fields in the Seed status and comparing to the actual resource consumption calculated by simply counting all shoots that meet a certain criteria (e.g. shoots that are scheduled onto the seed), then taking an average over all seeds in the set.
  • By sampling existing metrics exported for example by gardener-metrics-exporter.

The second approach decouples the seed autoscaler controller from the actual metrics evaluation, and therefore allows plugging in new metrics more easily. It also has the advantage that the exported metrics could also be used for other purposes, e.g. for triggering Prometheus alerts or building Grafana dashboards. It has the disadvantage that the seed autoscaler controller would depend on the metrics exporter to do its job properly.