Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: decouple vmss with 0 instance from lb when deleting the service #2489

Merged
merged 2 commits into from Oct 14, 2022

Conversation

nilo19
Copy link
Contributor

@nilo19 nilo19 commented Oct 12, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

We parse the id of ipConfigs in the lb backend pool to determine what vmss are needed to be decoupled from the lb. But for those with 0 instance, they cannot be decoupled since there is no corresponding ipConfigs in the lb. This PR checks all cached vmss, if it is bound with the lb, we decouple it.

Which issue(s) this PR fixes:

Fixes #2443

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix: decouple vmss with 0 instance from lb when deleting the service

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 12, 2022
@netlify
Copy link

netlify bot commented Oct 12, 2022

Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.

Name Link
🔨 Latest commit 5f97ea9
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cloud-provide-azure/deploys/6348c8cc428856000824c748

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 12, 2022
lastUpdate: time.Now().UTC(),
})
}
localCache.Store(*scaleSet.Name, &vmssEntry{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache all vmss instead of only uniform ones. I don't think this will break the logic of vmss flex but @zmyzheng can correct me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the logic.

@@ -926,17 +926,7 @@ func (fs *FlexScaleSet) EnsureBackendPoolDeleted(service *v1.Service, backendPoo
}
}

// 1. Ensure the backendPoolID is deleted from the VMSS.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we ensure all vmss are decoupled from the lb in azure_vmss.go, we don't need it here, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still need it. Per AKS request, I will add another vmType == vmssflex for pure vmssflex cluster. In this case, we skip the node type check and only initialize FlexScaleSet ( similar to the pure standalone VM cluster).

@@ -92,13 +92,11 @@ func (ss *ScaleSet) newVMSSCache() (*azcache.TimedCache, error) {
klog.Warning("failed to get the name of VMSS")
continue
}
if scaleSet.OrchestrationMode == "" || scaleSet.OrchestrationMode == compute.OrchestrationModeUniform {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has changed the original vmss flex behavior cc @zmyzheng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not save VMSS Flex into the vmssCache.
vmssCache is only used for vmss uniform (although the name is a little missleading)
The VMSS Flex cache is inside ss.flexScaleSet.vmssFlexCache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason we need two different caches is because the vm list API is different between Uniform and Flex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I roll back the change and use flex cache to get all vmss. Can you help review again?

@coveralls
Copy link

coveralls commented Oct 13, 2022

Coverage Status

Coverage decreased (-0.05%) to 79.868% when pulling 5f97ea9 on nilo19:fix/zero-vmss into 8d0fb7f on kubernetes-sigs:master.

Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 13, 2022
@andyzhangx
Copy link
Member

/hold
hold a moment for other comments

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2022
klog.V(3).Infof("ensureBackendPoolDeletedFromVMSS: found vmss %s being deleted, skipping", to.String(vmss.Name))
return true
}
if vmss.VirtualMachineProfile.NetworkProfile.NetworkInterfaceConfigurations == nil {
Copy link
Contributor

@zmyzheng zmyzheng Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using VMSS Flex, it is possible the VMSS Flex does not have vm profile. We can skip the VMSS FLex which does not have vm profile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, feiskyer, nilo19, zmyzheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [andyzhangx,feiskyer,nilo19]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zmyzheng
Copy link
Contributor

/lgtm

@k8s-ci-robot
Copy link
Contributor

@zmyzheng: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2022
@nilo19
Copy link
Contributor Author

nilo19 commented Oct 14, 2022

Need an LGTM.

@feiskyer
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2022
@k8s-ci-robot k8s-ci-robot merged commit 905f6f0 into kubernetes-sigs:master Oct 14, 2022
@nilo19 nilo19 deleted the fix/zero-vmss branch October 15, 2022 00:14
@nilo19
Copy link
Contributor Author

nilo19 commented Oct 15, 2022

/cherrypick release-1.25

@k8s-infra-cherrypick-robot

@nilo19: #2489 failed to apply on top of branch "release-1.25":

Applying: chore: update deploy-cluster.sh
Applying: fix: decouple vmss with 0 instance from lb when deleting the service
Using index info to reconstruct a base tree...
M	pkg/provider/azure_vmss.go
M	pkg/provider/azure_vmss_cache.go
M	pkg/provider/azure_vmss_test.go
M	pkg/provider/azure_vmssflex_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/provider/azure_vmssflex_test.go
Auto-merging pkg/provider/azure_vmss_test.go
Auto-merging pkg/provider/azure_vmss_cache.go
Auto-merging pkg/provider/azure_vmss.go
CONFLICT (content): Merge conflict in pkg/provider/azure_vmss.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 fix: decouple vmss with 0 instance from lb when deleting the service
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.25

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should remove LB backend pool reference from VMSS with 0 capacity when updating backend pool type to nodeIP
7 participants