Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/v1.56] Allow external CCMs to handle node objects before MC #1652

Merged

Conversation

mfranczy
Copy link
Contributor

@mfranczy mfranczy commented Jun 2, 2023

Manual backport of #1645

What this PR does / why we need it:

Allow external CCMs to handle failing node objects before MC takes any action.
It prevents race condition between MC and CCM.

The PR introduces a handleNodeFailuresWithExternalCCM function that:

handleNodeFailuresWithExternalCCM reacts to node status discovery of CCM's node lifecycle controller.
If an instance at cloud provider is not found then it waits till CCM deletes node objects, that allows:

  • create a new instance at cloud provider
  • initialize a new node object - the object should not be reused between instance creation
    for example, instance foo that got deleted and recreated should initialize a completely new node object
    instead of reusing the old one as it can cause problems to update node's metadata, like IP address.

If node is shut-down it allows MC to react accordingly to specific cloud provider requirements, those are:

  • wait for node to become online again or
  • delete a machine which cannot be recovered

Which issue(s) this PR fixes:

Fixes kubermatic/kubermatic#12218

What type of PR is this?
/kind bug

Special notes for your reviewer:

Does this PR introduce a user-facing change? Then add your Release Note here:

Allow external CCMs to handle failing node objects before MC.

Documentation:

NONE

Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. labels Jun 2, 2023
@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfranczy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot kubermatic-bot added sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. sig/virtualization Denotes a PR or issue as being assigned to SIG Virtualization. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2023
@mfranczy
Copy link
Contributor Author

mfranczy commented Jun 2, 2023

/test pull-machine-controller-e2e-kubevirt

@mfranczy
Copy link
Contributor Author

mfranczy commented Jun 2, 2023

/retest

Otherwise we have a race condition between MC and CCM.
Both try to check status of instances at cloud provider.
If MC reconciles instances first then kubelet will reuse the old node object
which is problematic in case of IP change.

Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
@mfranczy
Copy link
Contributor Author

mfranczy commented Jun 3, 2023

/test pull-machine-controller-e2e-gce

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2023
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b1fcc5a94ebdd8353b55d9d2a4d8fd7bab71b0c6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. sig/virtualization Denotes a PR or issue as being assigned to SIG Virtualization. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants