Skip to content

Latest commit

 

History

History
121 lines (87 loc) · 9.4 KB

controller-manager.md

File metadata and controls

121 lines (87 loc) · 9.4 KB

Gardener Controller Manager

The Gardener Controller Manager (often refered to as "GCM") is a component that runs next to the Gardener API server, similar to the Kubernetes Controller Manager. It runs several control loops that do not require talking to any seed or shoot cluster. Also, as of today it exposes a HTTPS server that is serving several endpoints for webhooks for certain resources.

This document explains the various functionalities of the Gardener Controller Manager and their purpose.

Control Loops

Project Controller

This controller consists out of two reconciliation loops: The main loop is reconciling Project resources while the second loop is controlling the necessary actions for stale projects.

"Main" Reconciler

This reconciler will create a dedicated Namespace prefixed with garden- for each Project resource. The name of the namespace can either be stated in the .spec.namespace, or it will be auto-generated by the reconciler. If .spec.namespace is set then it creates it if it does not exist yet. Otherwise, it tries to adopt it. This will only succeed if the Namespace was previously labeled with gardener.cloud/role=project and project.gardener.cloud/name=<project-name>. This is to prevent that end-users can adopt arbitrary namespaces and escalate their privileges, e.g. the kube-system namespace.

After the namespace was created/adopted the reconciler creates several ClusterRoles and ClusterRoleBindings that allow the project members to access related resources based on their roles. These RBAC resources are prefixed with gardener.cloud:system:project{-member,-viewer}:<project-name>. Gardener administrators and extension developers can define their own roles, see this document for more information.

In addition, operators can configure the Project controller to maintain a default ResourceQuota for project namespaces. Quotas can especially limit the creation of user facing resources, e.g. Shoots, SecretBindings, Secrets and thus protect the Garden cluster from massive resource exhaustion but also enable operators to align quotas with respective enterprise policies.

⚠️ Gardener itself is not exempted from configured quotas. For example, Gardener creates Secrets for every shoot cluster in the project namespace and at the same time increases the available quota count. Please mind this additional resource consumption.

The GCM configuration provides a template section controllers.project.quotas where such a ResourceQuota (see example below) can be deposited.

controllers:
  project:
    quotas:
    - config:
        apiVersion: v1
        kind: ResourceQuota
        spec:
          hard:
            count/shoots.core.gardener.cloud: "100"
            count/secretbindings.core.gardener.cloud: "10"
            count/secrets: "800"
      projectSelector: {}

The Project controller takes the shown config and creates a ResourceQuota with the name gardener in the project namespace. If a ResourceQuota resource with the name gardener already exists, the controller will only update fields in spec.hard which are unavailable at that time. An optional projectSelector narrows down the amount of projects that are equipped with the given config. If multiple configs match for a project, then only the first match in the list is applied to the project namespace.

The .status.phase of the Project resources will be set to Ready or Failed by the reconciler to indicate whether the reconciliation loop was performed successfully. Also, it will generate Events to provide further information about its operations.

"Stale Projects" Reconciler

As Gardener is a large-scale Kubernetes as a Service it is designed for being used by a large amount of end-users. Over time, it is likely to happen that some of the hundreds or thousands of Project resources are no longer actively used.

Gardener offers the "stale projects" reconciler which will take care of identifying such stale projects, marking them with a "warning", and eventually deleting them after a certain time period. This reconciler is enabled by default and works as following:

  1. Projects are considered as "stale"/not actively used when all of the following conditions apply: The namespace associated with the Project does not have any...
    1. Shoot resources.
    2. Plant resources.
    3. BackupEntry resources.
    4. Secret resources that are referenced by a SecretBinding that is in use by a Shoot (not necessarily in the same namespace).
    5. Quota resources that are referenced by a SecretBinding that is in use by a Shoot (not necessarily in the same namespace).

If a project is considered "stale" then its .status.staleSinceTimestamp will be set to the time when it was first detected to be stale. If it gets actively used again this timestamp will be removed. After some time the .status.staleAutoDeleteTimestamp will be set to a timestamp after which Gardener will auto-delete the Project resource if it still is not actively used.

The component configuration of the Gardener Controller Manager offers to configure the following options:

  • minimumLifetimeDays: Don't consider newly created Projects as "stale" too early to give people/end-users some time to onboard and get familiar with the system. The "stale project" reconciler won't set any timestamp for Projects younger than minimumLifetimeDays. When you change this value then projects marked as "stale" may be no longer marked as "stale" in case they are young enough, or vice versa.
  • staleGracePeriodDays: Don't compute auto-delete timestamps for stale Projects that are unused for only less than staleGracePeriodDays. This is to not unnecessarily make people/end-users nervous "just because" they haven't actively used their Project for a given amount of time. When you change this value then already assigned auto-delete timestamps may be removed again if the new grace period is not yet exceeded.
  • staleExpirationTimeDays: Expiration time after which stale Projects are finally auto-deleted (after .status.staleSinceTimestamp). If this value is changed and an auto-delete timestamp got already assigned to the projects then the new value will only take effect if it's increased. Hence, decreasing the staleExpirationTimeDays will not decrease already assigned auto-delete timestamps.

Gardener administrators/operators can exclude specific Projects from the stale check by annotating the related Namespace resource with project.gardener.cloud/skip-stale-check=true.

Event Controller

With the Gardener Event Controller you can prolong the lifespan of events related to Shoot clusters. This is an optional controller which will become active once you provide the below mentioned configuration.

All events in K8s are deleted after a configurable time-to-live (controlled via a kube-apiserver argument called --event-ttl (defaulting to 1 hour)). The need to prolong the time-to-live for Shoot cluster events frequently arises when debugging customer issues on live systems. This controller leaves events involving Shoots untouched while deleting all other events after a configured time. In order to activate it, provide the following configuration:

  • concurrentSyncs: The amount of goroutines scheduled for reconciling events.
  • ttlNonShootEvents: When an event reaches this time-to-live it gets deleted unless it is a Shoot-related event (defaults to 1h, equivalent to the event-ttl default).

⚠️ In addition, you should also configure the --event-ttl for the kube-apiserver to define an upper-limit of how long Shoot-related events should be stored. The --event-ttl should be larger than the ttlNonShootEvents or this controller will have no effect.

Shoot Reference Controller

Shoot objects may specify references to further objects in the Garden cluster which are required for certain features. For example, users can configure various DNS providers via .spec.dns.providers and usually need to refer to a corresponding secret with valid DNS provider credentials inside. Such objects need a special protection against deletion requests as long as they are still being referenced by one or multiple shoots.

Therefore, the Shoot Reference Controller scans shoot clusters for referenced objects and adds the finalizer gardener.cloud/reference-protection to their .metadata.finalizers list. The scanned shoot also gets this finalizer to enable a proper garbage collection in case the Gardener-Controller-Manager is offline at the moment of an incoming deletion request. When an object is not actively referenced anymore because the shoot specification has changed or all related shoots were deleted (are in deletion), the controller will remove the added finalizer again, so that the object can safely be deleted or garbage collected.

The Shoot Reference Controller can inspect the following references:

  • Enabled by default:
    • DNS provider secrets (.spec.dns.provider)
  • Disabled by default:
    • Audit policy configmaps (.spec.kubernetes.kubeAPIServer.auditConfig.auditPolicy.configMapRef)

If you want to enable the audit policy configmap protection then you can set the .controllers.shootReference.protectAuditPolicyConfigMaps to true in the component configuration.

Further checks might be added in the future.