Skip to content

Skip admission webhook validation for deleting MachineDeployments#2017

Merged
kubermatic-bot merged 1 commit into
mainfrom
fix/md-webhook-skip-deletion-validation
May 3, 2026
Merged

Skip admission webhook validation for deleting MachineDeployments#2017
kubermatic-bot merged 1 commit into
mainfrom
fix/md-webhook-skip-deletion-validation

Conversation

@buraksekili
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:
The MachineDeployment admission webhook (/machinedeployments) runs full cloud provider validation, including live API calls to resolve images on every UPDATE, even when the MachineDeployment has a DeletionTimestamp set and is being garbage collected.

When OSM's operating-system-config-controller removes its cleanup finalizer (kubermatic.io/cleanup-operating-system-configs) via workerClient.Update(), the UPDATE is routed through the webhook. If the provider image is invalid or no longer resolves (e.g. an AMI that was deleted from AWS), Validate() returns an error, the webhook denies the UPDATE, and OSM cannot remove its finalizer. The MachineDeployment stays stuck in Terminating indefinitely.

This PR adds a DeletionTimestamp early return to mutateMachineDeployments, placed before DeepCopy() and the full mutation/validation pipeline. Objects under deletion return Allowed: true immediately, skipping the unnecessary deep copy, re-marshaling, and cloud provider validation call.

Which issue(s) this PR fixes:
Fixes kubermatic/kubermatic#15802

What type of PR is this?
/kind bug

Special notes for your reviewer:

Does this PR introduce a user-facing change? Then add your Release Note here:

MachineDeployments can now be deleted even when the provider image is invalid or unresolvable. Previously, such MachineDeployments would get stuck in Terminating because the admission webhook blocked OSM from removing its cleanup finalizer.

Documentation:

NONE

Test issue:

TBD

The MachineDeployment admission webhook runs full cloud provider
validation (including live API calls) on UPDATEs, even when the
object is being deleted. This blocks OSM from removing its cleanup
finalizer when the provider image is invalid or unresolvable,
leaving MachineDeployments stuck in Terminating indefinitely.

Add a DeletionTimestamp early return to mutateMachineDeployments,
mirroring the pattern used in KKP's cluster mutator, OSM's
MachineDeployment mutation handler, and controller-runtime's
CustomDefaulter.

Fixes kubermatic/kubermatic#15802

Signed-off-by: Burak Sekili <32663655+buraksekili@users.noreply.github.com>
@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/none Denotes a PR that doesn't need documentation (changes). sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 1, 2026
Copy link
Copy Markdown
Member

@kron4eg kron4eg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

long should have been done ❤️

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2026
@kubermatic-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kron4eg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: fcb7db7075f90190d8f3e52af3643458ddde458c

@kubermatic-bot kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2026
@kubermatic-triage-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubermatic-bot kubermatic-bot merged commit 7a607d8 into main May 3, 2026
12 checks passed
@kubermatic-bot kubermatic-bot deleted the fix/md-webhook-skip-deletion-validation branch May 3, 2026 21:34
@buraksekili
Copy link
Copy Markdown
Contributor Author

/cherrypick release/v1.65

@buraksekili
Copy link
Copy Markdown
Contributor Author

/cherrypick release/v1.64

@kubermatic-bot
Copy link
Copy Markdown
Contributor

@buraksekili: new pull request created: #2032

Details

In response to this:

/cherrypick release/v1.65

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubermatic-bot
Copy link
Copy Markdown
Contributor

@buraksekili: new pull request created: #2033

Details

In response to this:

/cherrypick release/v1.64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/none Denotes a PR that doesn't need documentation (changes). kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine deployment termination is blocked on image validation failure.

4 participants