Skip to content

Implementing MachineType Successors in Gardener #57

@robinschneider

Description

@robinschneider

How to categorize this topic?

/area ops-productivity
/kind enhancement
/label teamsize/medium

What is the topic about?:
Enhance the CloudProfile schema to align the MachineType definition with the MachineImage lifecycle proposed in GEP32 and extend it with successor and migrationPolicy .

# Current Date: 2026-04-15
apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
spec:
  machineTypes:
    - name: m5.large
      cpu: 2
      gpu: 0
      memory: 8Gi
      lifecycle:
        - classification: preview
          # Implicitly starts if no startTime
        - classification: supported
          startTime: "2025-01-01T00:00:00Z"
        - classification: deprecated
          startTime: "2026-05-01T00:00:00Z"
        - classification: expired
          startTime: "2026-07-01T00:00:00Z"
          successor: m6i.large
          migrationPolicy: ForceUpgrade # Options: ForceUpgrade, Manual, BlockScaling

The successor field would define the new MachineType while the migrationPolicy defines how to handle it. Possible options would be ForceUpgrade, Manual or BlockScaling.

ForceUpgrade

Once expired.startTime is reached, Gardener patches the Shoot spec.
This keeps the infrastructure free of technical debt, but is the most disruptive for sensitive StatefulSets.

Manual

This offers the most user control, as it currently is a manual task to update clusters with new MachineTypes once they reach their EOL.

BlockScaling

Existing nodes are allowed to stay, but the MachineControllerManager is blocked from creating new nodes of the expired type. This will prevent the creation of new machines of a MachineType that is no longer available and no new instances will be provisioned.

For backwards compatibility, Manual or BlockScaling should be the default.

It should also be possible to override this in the Shoot spec if needed, e.g.

metadata:
  annotations:
    confirmation.gardener.cloud/skip-machine-type-migration: "true"

For unexpected unavailability by the cloud provider, the unavailable classification can be used.

The user should be warned when the current MachineType is deprecated and the successor should be recommended to them, e.g. via a Shoot constraint MachineTypeDeprecated analogous to existing version constraints.

successor should be a single value, a user would have always the choice to pick any other MachineType, but Gardener should have a definitive successor.

Of course, everything is open to discussion, but here are some open questions:

  • What happens to worker pools where the configuration is incompatible with the new MachineType (e.g. zones)
  • Will there be a rolling update or only the spec changed and waited for the next reconcile (node pool will be updated in the maintenance window)
  • What about hibernated Shoots
  • What happens, if the successor is deprecated or expired as well?
  • What happens, if the successor chain is circular?
  • How to handle NamespacedCloudProfile?
  • How does unavailable differ from expired, will it start without a date?
  • Should the override annotation confirmation.gardener.cloud/skip-machine-type-migration
    apply only to ForceUpgrade (skipping the patch for one maintenance window),
    or should there be a symmetric mechanism for BlockScaling
    (e.g. gardener.cloud/operation=force-machine-type-migration to allow scaling despite the block)?
  • Who is allowed to set the override annotation? Should this be enforced via an admission plugin,
    so that regular project members cannot bypass an explicitly configured CloudProfile policy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Q2/2026This topic is relevant for the hackathon in Q2/2026.area/ops-productivityOperator productivity related (how to improve operations)kind/enhancementEnhancement, improvement, extensionlifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.teamsize/mediumA team of 3 people.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions