Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved NIC creation and deletion logic #594

Merged
merged 1 commit into from
Mar 25, 2021

Conversation

prashanth26
Copy link
Contributor

@prashanth26 prashanth26 commented Mar 15, 2021

  • NIC creation now adopts any existing NICs with matching names
  • NIC deletion confirms deletion by performing GET on the deletion
  • Improved creation logs to give more details

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes https://github.com/gardener/machine-controller-manager/issues/544

Special notes for your reviewer:

Release note:

Azure: Improved NIC creation and deletion logic to handle NIC creation and deletions more gracefully.

@prashanth26 prashanth26 requested a review from a team as a code owner March 15, 2021 09:28
@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Mar 15, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 15, 2021
Comment on lines 509 to 510
// Change to something like below, however I am unable to find the helper method for this.
// if err != nil && management.IsResourceNotFoundError(err) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having some trouble with validating the error code when NIC doesn't exist. Any suggestions?

cc: @AxiomSamarth , @amshuman-kr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your suggestion i have updated the logic here to this - https://github.com/gardener/machine-controller-manager/pull/594/files#diff-36365e28193d28c39f36ac4d9172fbf8eac69553361f6179fa3184f9cad9eb0cR1329-R1342. I hope this is more appropriate now.

@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 15, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 16, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 16, 2021
@prashanth26
Copy link
Contributor Author

/invite @MSSedusch

Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prashanth26! The PR itself looks good. Some optional nit-picks and design pattern suggestions below.

@@ -1035,20 +1047,64 @@ func (clients *azureDriverClients) checkOrphanDisks(ctx context.Context, resourc
}

func (clients *azureDriverClients) deleteNIC(ctx context.Context, resourceGroupName string, nicName string) error {
var (
nicDeletionTimeout = 10 * time.Minute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1077 to 1079
sleepInterval = 10 * time.Second
maxSleepInterval = 3 * time.Minute
currentSleepTime = 0 * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants? Or even better, why not use client-go/backoff? https://github.com/kubernetes/client-go/blob/master/util/flowcontrol/backoff.go#L32-L38

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of using a backoff client, didn't know which one to use. Yes, this makes sense. Let me check it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through using the client-go/backoff mechanism. However, it looks like it's usecase is a little different compared to ours. It seems like it's keeping a bucket for objects with different names that are being backed off. Refer below

So looked up more appropriate libraries for our use-case. I found this backoff library that seems to be used by a lot of projects. And usage also seems simple and works well in our case. The Licence however is MIT, I hope that should be fine as I see other vendors using similar licenses - https://github.com/cenkalti/backoff/blob/v4/LICENSE.

pkg/driver/driver_azure.go Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else and removed size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Mar 19, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 19, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 19, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 19, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 19, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 19, 2021
@prashanth26
Copy link
Contributor Author

/ok-to-test

@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 19, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 19, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 19, 2021
@prashanth26
Copy link
Contributor Author

/ok-to-test

@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 19, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 19, 2021
@prashanth26
Copy link
Contributor Author

/needs review

@prashanth26
Copy link
Contributor Author

@AxiomSamarth - We will probably need to incorporate this change prior to the MCM azure provider release.

/*
NIC creation
*/
NIC, err := clients.nic.Get(ctx, resourceGroupName, nicName, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Prashanth,

If this Get() call returns an error apart from 404 Not Found, it will result in nil pointer error at line 634 VMParameters := d.getVMParameters(vmName, vmImageRef, *NIC.ID) as it will have no reference to NIC.

Found this issue while I was incorporating this in OOT and wrote a UT for the same. Please let me know your thoughts about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. This is why we need tests. :)

I have tried to fix this scenario with this commit. Please let me know if it makes sense. The commit looks a little confusing, however in reality it is only one extra if condition. Not sure why GitHub shows the diff changes differently. Locally it looks much less.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! This works like a charm! Thank you :)

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 24, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 24, 2021
return nil
}

klog.V(4).Infof("NIC doesn't existance for %q", nicName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the log as simple as NIC not found? The current one does not seem to be so good. Just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Here is the change - 3c0e156.

@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 25, 2021
Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the change @prashanth26! and sorry for the delay in approving the PR. LGTM

@prashanth26
Copy link
Contributor Author

Thanks @amshuman-kr . Will squash all changes, run the changes once more and then merge it.

- NIC creation now, adopts any existing NICs with matching name
- NIC deletion confirms deletion by performing GET on deletion
- Improved creation logs to give more details
- Revendored additional libraries
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 25, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 25, 2021
@prashanth26
Copy link
Contributor Author

Made a final round of test. Looks good to me. Merging now.

@prashanth26 prashanth26 merged commit 85382aa into gardener:master Mar 25, 2021
@prashanth26 prashanth26 deleted the fix/azure-orphan-nics branch April 28, 2021 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/second-opinion Needs second review by someone else size/l Size of pull request is large (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Orphaned Azure NICs block subnet deletion afterwards
7 participants