Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Let machinePools to honour NodeDeletionTimeout #10553

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

serngawy
Copy link

@serngawy serngawy commented May 3, 2024

What this PR does / why we need it:
Ignore unreachable cluster while deleting machinePools

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #10544

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels May 3, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @serngawy!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @serngawy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 3, 2024
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/area machinepools

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: The label(s) area/machinepools cannot be applied, because the repository doesn't have them.

In response to this:

/ok-to-test
/area machinepools

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2024
@killianmuldoon killianmuldoon added the area/machinepool Issues or PRs related to machinepools label May 8, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label May 8, 2024
@chrischdi
Copy link
Member

I think this one goes into a different direction then described in:

and

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 8, 2024
@serngawy
Copy link
Author

/cc @mboersma

@serngawy
Copy link
Author

@mboersma would review the PR and let me know your thoughts.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 21, 2024
@serngawy serngawy force-pushed the issue-10544 branch 2 times, most recently from 022d53e to 6da8906 Compare May 21, 2024 19:39
@enxebre
Copy link
Member

enxebre commented Jun 19, 2024

/retitle Let machinePools to honour NodeDeletionTimeout

I dropped some more comments, looking good to me otherwise.
After merging we can consider follow up PR to wrap only the if !r.isNodesDeleteTimeoutPassed(machinepool) { block with a isDeleteNodeAllowed that checks the if guest kas is operational

@k8s-ci-robot k8s-ci-robot changed the title 🐛 Issue-10544 ignore unreachable cluster while deleting machinePool Let machinePools to honour NodeDeletionTimeout Jun 19, 2024
@serngawy serngawy changed the title Let machinePools to honour NodeDeletionTimeout 🐛 Let machinePools to honour NodeDeletionTimeout Jun 19, 2024
@sbueringer
Copy link
Member

I would also take a look once I get around to it

@enxebre
Copy link
Member

enxebre commented Jun 20, 2024

Dropped a few more nits.
/lgtm

otherwise

/assign @sbueringer

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7099007f9d7a3c07faed6267a6b79ab73302d62e

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from enxebre. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@enxebre
Copy link
Member

enxebre commented Jun 27, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 27, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cafa1c9ed7f1ed97e2b9adc4cd0929645ea21e8f

Signed-off-by: melserngawy <melserng@redhat.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 4, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@sbueringer sbueringer added this to the v1.8 milestone Jul 10, 2024
return err
}
} else {
ctrl.LoggerFrom(ctx).Info("MachinePool %s NodeDeleteTimeout passed, force delete the machinePool", machinePool.Namespace, machinePool.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctrl.LoggerFrom(ctx).Info("MachinePool %s NodeDeleteTimeout passed, force delete the machinePool", machinePool.Namespace, machinePool.Name)
ctrl.LoggerFrom(ctx).Info("NodeDeleteTimeout passed, skipping Node deletion")

No need to add the MachinePool. Controller-runtime already adds the reconciled object

(Please note: Info doesn't take a format string + args, it takes k/v pairs)

err: false,
mpExist: false,
result: reconcile.Result{},
err: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err: true,
// Note: We expect an error here because patchHelper won't be able to patch the MachinePool after the finalizer was removed (and the MachinePool object is then gone)
// The important part is that we verify that the MachinePool doesn't exist anymore
err: true,

expected: expected{
mpExist: true,
result: reconcile.Result{},
err: true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
},
},
withTracker: true,

Otherwise we are always hitting the code path that the tracker is nil. I think that his not what we want to test here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let's please drop the withTracker field and let's just always set the tracker. I don't see a reason why we should test the case where Tracker is not set, as this never happens at runtime

expected: expected{
mpExist: true,
result: reconcile.Result{},
err: true,
},
},
}

for i := range testCases {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly this is a bit racy:

  • In some cases we will only get errors if the patchHelper fails to update the MachinePool, if the MachinePool goes away quickly enough after reconcileDelete removes the finalizer
    • => I would suggest to drop the error check
  • In some cases mpExist is set to true, even though we expect the MachinePool to go away after the finalizer is removed
    • => I would recommend to change the check for mpExists to use Eventually & Consistently
			if tc.expected.mpExist {
				g.Consistently(func(g Gomega) {
					g.Expect(r.Client.Get(ctx, key, &expv1.MachinePool{})).To(Succeed())
				}).WithTimeout(2 * time.Second)
			} else {
				// We expect to fail retrieve the machinePool as it is deleted
				g.Eventually(func(g Gomega) {
					g.Expect(apierrors.IsNotFound(r.Client.Get(ctx, key, &expv1.MachinePool{}))).To(BeTrue())
				}).WithTimeout(5 * time.Second)
			}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once these changes are made, the assertions are correct, but I think the last test case doesn't succeed anymore. Looks like the MP deletion is not actually blocked with our current test setup

@sbueringer
Copy link
Member

I think we're almost there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to delete machinePool for unreachable cluster
7 participants