Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If nodes are are not in a ready state when the controller manager boots, the cache is stale causing 15 min delay of LB deployment #363

Closed
jsturtevant opened this issue Aug 3, 2020 · 10 comments · Fixed by #537
Assignees

Comments

@jsturtevant
Copy link
Contributor

What happened:
When adding functionality to the e2e tests in kubernetes-sigs/cluster-api-provider-azure#857 for CAPZ, I found the e2e tests would fail if run against a single control plane node cluster. This would occasionally happen when running multi-node control plane clusters as well. This was initially reported with some of the message in #338

What happens is the control manager comes online and queries for azure machines power status. Since the machines are not available yet they are not in the cache. When the request comes in for the load balancer the cache is queried and reports that the node does not exist as a VMAS and attempts to run the VMSS code hence the following error message. When it is found the in the cache it goes down the correct code path.

E0802 23:30:06.928574       1 azure_vmss.go:1116] EnsureHostInPool(default/ingress-nginx-ilb): backendPoolID(/subscriptions/b9d9436a-0c07-4fe8-b779-2c1030bd7997/resourceGroups/capz-e2e-72fll1/providers/Microsoft.Network/loadBalancers/capz-e2e-72fll1-internal/backendAddressPools/capz-e2e-72fll1) - failed to ensure host in pool: "not a vmss instance"

I was able to confirm the cache was missing the node by adding some custom logs to the controller-manager. The initial request to the cache doesn't have the nodes:

I0802 23:27:42.217347       1 azure_vmss.go:309] Can not extract scale set name from providerID (azure:///subscriptions/b9d9436a-0c07-4fe8-b779-2c1030bd7997/resourceGroups/capz-e2e-72fll1/providers/Microsoft.Compute/virtualMachines/capz-e2e-72fll1-control-plane-wjqp5), assuming it is mananaged by availability set: not a vmss instance
I0802 23:27:42.217487       1 azure_vmss.go:192] JJS: powerstatus node capz-e2e-72fll1-control-plane-wjqp5
I0802 23:27:42.217638       1 azure_vmss_cache.go:238] JJS: enter cache calls for nodes
I0802 23:27:42.217725       1 azure_vmss_cache.go:240] JJS: listing nodes for rg: capz-e2e-72fll1
I0802 23:27:42.651485       1 azure_backoff.go:100] VirtualMachinesClient.List(capz-e2e-72fll1) success
I0802 23:27:42.651641       1 azure_vmss_cache.go:246] JJS: nodes found: %!s(int=1)
I0802 23:27:42.651750       1 azure_vmss_cache.go:249] JJS: adding node to local cache {{%!s(*http.Response=<nil>)} %!s(*compute.Plan=<nil>) %!s(*compute.VirtualMachineProperties=&{0xc001fd3eb0 0xc002022960 <nil> 0xc001f13ea0 0xc0020280a0 <nil> <nil> <nil> <nil>   <nil> <nil> 0xc001fd3fd0 <nil> <nil> 0xc001fd3ea0}) %!s(*[]compute.VirtualMachineExtension=<nil>) %!s(*compute.VirtualMachineIdentity=<nil>) %!s(*[]string=&[1]) %!s(*string=0xc001fd2a60) %!s(*string=0xc001fd2a50) %!s(*string=0xc001fd2a70) %!s(*string=0xc001fd2a80) map[Name:%!s(*string=0xc001fd2a90) creationTimestamp:%!s(*string=0xc001fd2ab0) jobName:%!s(*string=0xc001fd2ad0) kubernetes.io_cluster_capz-e2e-72fll1-control-plane-wjqp5:%!s(*string=0xc001fd2af0) sigs.k8s.io_cluster-api-provider-azure_cluster_capz-e2e-72fll1:%!s(*string=0xc001fd2b10) sigs.k8s.io_cluster-api-provider-azure_role:%!s(*string=0xc001fd2b30)]}
I0802 23:27:42.651962       1 azure_vmss_cache.go:278] JJS: VMAS Nodes: map[capz-e2e-72fll1-control-plane-wjqp5:{}]
I0802 23:27:42.652194       1 azure_vmss_cache.go:280] JJS: asking for node capz-e2e-72fll1-control-plane-wjqp5
I0802 23:27:42.698224       1 azure_instances.go:232] InstanceShutdownByProviderID gets power status "running" for node "capz-e2e-72fll1-control-plane-wjqp5"

You can see request for powerstatus here:

I0802 23:28:32.709471       1 azure_vmss.go:192] JJS: powerstatus node capz-e2e-72fll1-md-0-q2st9
I0802 23:28:32.709626       1 azure_vmss_cache.go:278] JJS: VMAS Nodes: map[capz-e2e-72fll1-control-plane-wjqp5:{}]
I0802 23:28:32.709652       1 azure_vmss_cache.go:280] JJS: asking for node capz-e2e-72fll1-md-0-q2st9
I0802 23:28:32.709836       1 azure_vmss.go:309] Can not extract scale set name from providerID (azure:///subscriptions/b9d9436a-0c07-4fe8-b779-2c1030bd7997/resourceGroups/capz-e2e-72fll1/providers/Microsoft.Compute/virtualMachines/capz-e2e-72fll1-md-0-q2st9), assuming it is mananaged by availability set: not a vmss instance

And request for EnsureHostsInPool doesn't have the VMAS nodes in the cache:

I0802 23:30:06.928474       1 azure_vmss.go:1077] JJS: EnsureHostsInPool node capz-e2e-72fll1-md-0-q2st9
I0802 23:30:06.928488       1 azure_vmss_cache.go:278] JJS: VMAS Nodes: map[capz-e2e-72fll1-control-plane-wjqp5:{}]
I0802 23:30:06.928518       1 azure_vmss_cache.go:280] JJS: asking for node capz-e2e-72fll1-md-0-q2st9
I0802 23:30:06.928529       1 azure_vmss.go:1097] JJS: EnsureHostsInPool node false managedby VMAS
I0802 23:30:06.928542       1 azure_vmss.go:1113] JJS: EnsureHostsInPool entered EnsureHostInPool started%!(EXTRA bool=false)
I0802 23:30:06.928549       1 azure_vmss.go:832] ensuring node "capz-e2e-72fll1-md-0-q2st9" of scaleset "" in LB backendpool "/subscriptions/b9d9436a-0c07-4fe8-b779-2c1030bd7997/resourceGroups/capz-e2e-72fll1/providers/Microsoft.Network/loadBalancers/capz-e2e-72fll1-internal/backendAddressPools/capz-e2e-72fll1"
E0802 23:30:06.928574       1 azure_vmss.go:1116] EnsureHostInPool(default/ingress-nginx-ilb): backendPoolID(/subscriptions/b9d9436a-0c07-4fe8-b779-2c1030bd7997/resourceGroups/capz-e2e-72fll1/providers/Microsoft.Network/loadBalancers/capz-e2e-72fll1-internal/backendAddressPools/capz-e2e-72fll1) - failed to ensure host in pool: "not a vmss instance"

What you expected to happen:

How to reproduce it:
Follow the instructions at kubernetes-sigs/cluster-api-provider-azure#857 or create a single node control plane cluster then after the control node is up, add a worker node.

Anything else we need to know?:
In multi control plane node clusters, leader election occurs in the controller-manager and causes the cache to reload. This is why it only happens occasionally in that scenario.

I believe this could happen if a node is added at a later time and was expected to be in the backend pool but wasn't and so could be important to fix.

Environment:

  • Kubernetes version (use kubectl version): 1.18.6
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: CAPZ
  • Others:
@alexeldeib
Copy link

alexeldeib commented Aug 3, 2020

Possibly related to cache read mode? https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/azure/azure_vmss.go#L1196 possibly should be CacheReadTypeUnsafe or CacheReadTypeForceRefresh? Still digging a bit.

VMSS does seem to force refresh if the instances aren't found: https://github.com/kubernetes/kubernetes/blob/ec560b9737537be8c688776461bc700e8ddedb9d/staging/src/k8s.io/legacy-cloud-providers/azure/azure_vmss.go#L276-L295

@aramase
Copy link
Member

aramase commented Aug 3, 2020

Possibly related to cache read mode? https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/azure/azure_vmss.go#L1196 possibly should be CacheReadTypeUnsafe or CacheReadTypeForceRefresh? Still digging a bit.

  • CacheReadTypeDefault reads from the cache if TTL not expired. In case of a cache miss, it'll fetch the contents and populate the cache. For VMSS, if VM instance is not found in the cache, then the cache is force refreshed but I think for AS it seems to be slightly different. We should probably align the logic, so it's same for VMSSVM instance or AS.
  • CacheReadTypeUnsafe means even if the cache entry is expired it's still safe to read the contents as they aren't expected to change.
  • CacheReadTypeForceRefresh force refreshes the cache even if the entry in cache is still valid.

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Aug 3, 2020

For VMSS, if VM instance is not found in the cache, then the cache is force refreshed but I think for AS it seems to be slightly different.

For AS I believe it is because the nodes are stored in the cache as a set. The set is cached and so the request for a single instance doesn't count as a cache miss and doesn't refresh:

https://github.com/kubernetes/kubernetes/blob/ec560b9737537be8c688776461bc700e8ddedb9d/staging/src/k8s.io/legacy-cloud-providers/azure/azure_vmss_cache.go#L307

@CecileRobertMichon
Copy link
Contributor

@aramase @jsturtevant @alexeldeib do we have an idea of what is needed to fix this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 19, 2021
@alexeldeib
Copy link

/remove-lifecycle rotten

pretty sure this still exists...need to refresh my memory for a fix

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 19, 2021
@CecileRobertMichon
Copy link
Contributor

/assign

looking into this

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Mar 4, 2021

So this fixes it: fcb1df0

the issue with that solution is that it will cause a force refresh of the VM cache on every run of isNodeManagedByAvailabilitySet unnecessarily when the node is indeed VMSS (and so it really should return false).

The previous reasoning for calling it with CacheReadTypeUnsafe was probably that only control plane nodes are expected to be VMAS in a "vmss" cluster (eg. AKS Engine vmss cluster with vmas master nodes), and those wouldn't change since scale up / down of control plane nodes isn't supported, and new control plane nodes joining as part of upgrade should trigger a cache refresh as leader election occurs.

Now in CAPZ we have both VM and VMSS worker nodes joining and leaving the cluster with scale ups/downs so this means when we check if a node is VM or VMSS we need to have the latest information. A better solution might be to trigger a cache refresh only when the list of nodes in the cluster has changed since last time the cache was refreshed (need to look into how feasible that is though). @aramase @alexeldeib @jsturtevant wdyt?

@CecileRobertMichon
Copy link
Contributor

Here's an attempt at only doing a refresh of the VM cache if the node is new to the cluster: master...CecileRobertMichon:fix-vm-cache

This should reduce a lot of the extra refreshes when the node is a known vmss node, and still refresh when the node is new to the cluster and we don't know if it's vm or vmss yet. Haven't tested it works yet but it's the general idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants