re-enable ILB test for VMSS #1177

CecileRobertMichon · 2021-02-16T19:08:03Z

This reverts commit c0d8e36.

What type of PR is this?
/kind failing-test

What this PR does / why we need it: Now that the root issue should be fixed in Azure, re-enable the test.

/hold until fix is rolled out to all regions

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1115

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

NONE

devigned

/lgtm

devigned · 2021-02-17T18:04:46Z

/retest

devigned · 2021-02-17T18:45:31Z

/retest

devigned · 2021-02-19T17:37:26Z

/retest

CecileRobertMichon · 2021-02-22T17:07:42Z

fix is now in all public cloud regions

/hold cancel

devigned · 2021-02-22T21:09:24Z

/retest

CecileRobertMichon · 2021-02-22T21:19:30Z

/approve

k8s-ci-robot · 2021-02-22T21:19:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [CecileRobertMichon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CecileRobertMichon · 2021-02-22T22:19:30Z

rebased on top of #1186

devigned

/lgtm

devigned · 2021-02-22T23:16:05Z

/retest

CecileRobertMichon · 2021-02-23T00:33:52Z

/retest

devigned · 2021-02-23T11:37:20Z

/retest

devigned · 2021-02-23T13:54:43Z

/retest
IPv6 fail

CecileRobertMichon · 2021-02-25T23:10:39Z

/retest

CecileRobertMichon · 2021-02-26T21:40:25Z

/retest

CecileRobertMichon · 2021-03-02T21:32:28Z

/retest

This reverts commit c0d8e36.

CecileRobertMichon · 2021-03-04T21:23:40Z

rebased on top of #1197

CecileRobertMichon · 2021-03-04T22:15:20Z

/retest

looking into the failure separately

Timed out after 300.114s.
Deployment default/web-windows failed

devigned · 2021-03-04T23:35:47Z

@CecileRobertMichon it looks like Windows VMSS is having problems with the LB, but it does look like the cluster provisioned successfully.

CecileRobertMichon · 2021-03-05T17:48:04Z

Digging into the error, here are the relevant logs:

I0304 22:33:35.721937       1 event.go:291] "Event occurred" object="default/web-elb" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0304 22:33:35.742696       1 azure_loadbalancer.go:677] reconcileLoadBalancer for service(default/web-elb) - wantLb(true): started
I0304 22:33:35.743157       1 endpoints_controller.go:349] Finished syncing service "default/web-elb" endpoints. (223.8µs)
I0304 22:33:35.743365       1 endpointslice_controller.go:297] Finished syncing service "default/web-elb" endpoint slices. (167.5µs)
I0304 22:33:35.749377       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=endpoints, namespace default, name web-ilb, uid 4ed7f798-645f-4c28-8f7b-77463ac819f7, event type delete
I0304 22:33:35.749427       1 endpointslicemirroring_controller.go:273] syncEndpoints("default/web-ilb")
I0304 22:33:35.749450       1 endpointslicemirroring_controller.go:283] default/web-ilb Endpoints not found, cleaning up any mirrored EndpointSlices
I0304 22:33:35.749471       1 endpointslicemirroring_controller.go:270] Finished syncing EndpointSlices for "default/web-ilb" Endpoints. (45.7µs)
I0304 22:33:35.751497       1 endpoints_controller.go:349] Finished syncing service "default/web-ilb" endpoints. (35.989983ms)
I0304 22:33:35.754035       1 endpoints_controller.go:349] Finished syncing service "default/web-ilb" endpoints. (2.475305ms)
I0304 22:33:35.810087       1 azure_backoff.go:251] LoadBalancerClient.List(capz-e2e-v1z4gy) success
I0304 22:33:35.810124       1 azure_loadbalancer.go:685] reconcileLoadBalancer for service(default/web-elb): lb(capz-e2e-v1z4gy/capz-e2e-v1z4gy) wantLb(true) resolved load balancer name
I0304 22:33:35.817136       1 request.go:581] Throttling request took 85.396697ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1?timeout=32s
I0304 22:33:35.832682       1 azure_wrap.go:109] Public IP "capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd" not found
I0304 22:33:35.832762       1 azure_loadbalancer.go:542] ensurePublicIPExists for service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating
I0304 22:33:35.832851       1 azure_loadbalancer.go:568] service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating as ipv4 for clusterIP:10.102.207.34
I0304 22:33:35.832867       1 azure_loadbalancer.go:571] ensurePublicIPExists for service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating
I0304 22:33:35.867104       1 request.go:581] Throttling request took 135.338112ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:35.917155       1 request.go:581] Throttling request took 185.373827ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:35.967130       1 request.go:581] Throttling request took 235.300842ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:36.018647       1 request.go:581] Throttling request took 286.80146ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:36.067043       1 request.go:581] Throttling request took 335.180572ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:36.072926       1 garbagecollector.go:519] "Deleting object" object="default/web-ilb-mttnz" objectUID=7f1359d1-a7e6-4658-9ed6-8f2d1f8bca36 kind="EndpointSlice" propagationPolicy=Background
I0304 22:33:36.079710       1 resource_quota_monitor.go:354] QuotaMonitor process object: discovery.k8s.io/v1beta1, Resource=endpointslices, namespace default, name web-ilb-mttnz, uid 7f1359d1-a7e6-4658-9ed6-8f2d1f8bca36, event type delete
I0304 22:33:38.515205       1 azure_loadbalancer.go:1033] reconcileLoadBalancerRule lb name (capz-e2e-v1z4gy) rule name (aea7219be404141198eff19df26913bd-TCP-80)
I0304 22:33:38.515237       1 azure_loadbalancer.go:1033] reconcileLoadBalancerRule lb name (capz-e2e-v1z4gy) rule name (aea7219be404141198eff19df26913bd-TCP-443)
I0304 22:33:38.515265       1 azure_loadbalancer.go:956] reconcileLoadBalancer: reconcileLoadBalancer for service(default/web-elb): lb(capz-e2e-v1z4gy) - updating
I0304 22:33:38.946461       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="75.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46592" resp=200
I0304 22:33:40.875327       1 request.go:581] Throttling request took 93.734308ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:40.925399       1 request.go:581] Throttling request took 143.785712ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:40.975341       1 request.go:581] Throttling request took 193.714916ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.025377       1 request.go:581] Throttling request took 239.143919ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.075373       1 request.go:581] Throttling request took 289.121422ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:41.125324       1 request.go:581] Throttling request took 339.067226ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.126962       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:33:41.193451       1 request.go:581] Throttling request took 90.833707ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:41.243417       1 request.go:581] Throttling request took 140.783811ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.254298       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:33:41.293537       1 request.go:581] Throttling request took 190.880215ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:41.343440       1 request.go:581] Throttling request took 240.767819ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.393447       1 request.go:581] Throttling request took 290.735422ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.443432       1 request.go:581] Throttling request took 340.684826ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:43.960453       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:33:43.960481       1 cronjob_controller.go:134] Found 0 groups
I0304 22:33:45.272480       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PodTemplate total 0 items received
I0304 22:33:48.845428       1 azure_vmss.go:931] ensuring node "capz-e2e-v1z4gy-mp-0000001" of scaleset "" in LB backendpool "/subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy"
I0304 22:33:48.845482       1 azure_vmss.go:184] Couldn't find VMSS VM with nodeName capz-e2e-v1z4gy-mp-0000001, refreshing the cache
I0304 22:33:48.935628       1 azure_vmss.go:931] ensuring node "win-p-win000001" of scaleset "" in LB backendpool "/subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy"
I0304 22:33:48.935679       1 azure_vmss.go:184] Couldn't find VMSS VM with nodeName win-p-win000001, refreshing the cache
I0304 22:33:48.947432       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="111.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46668" resp=200
I0304 22:33:48.988815       1 azure_vmss.go:1260] EnsureHostInPool begins to UpdateVMs for VMSS(capz-e2e-v1z4gy, win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:33:48.989407       1 azure_vmss.go:1260] EnsureHostInPool begins to UpdateVMs for VMSS(capz-e2e-v1z4gy, capz-e2e-v1z4gy-mp-0) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:33:49.394061       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ReplicaSet total 18 items received
I0304 22:33:50.299613       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:33:50.361429       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.CSIDriver total 0 items received
I0304 22:33:50.399076       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:33:51.268968       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:33:53.967027       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:33:53.967054       1 cronjob_controller.go:134] Found 0 groups
I0304 22:33:55.433825       1 gc_controller.go:164] GC'ing orphaned
I0304 22:33:55.433860       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:33:58.947128       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="110.3µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46754" resp=200
I0304 22:34:01.283564       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:03.974929       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:03.975034       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:05.290504       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:05.299889       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:05.399251       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:05.451575       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:08.948899       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="84.701µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46838" resp=200
I0304 22:34:09.396277       1 azure_vmss.go:1052] ensureVMSSInPool: ensuring VMSS with backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396414       1 azure_vmss.go:1082] ensureVMSSInPool begins to update VMSS map[capz-e2e-v1z4gy-mp-0:true win-p-win:true] with backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396553       1 azure_vmss.go:1176] ensureVMSSInPool begins to update vmss(capz-e2e-v1z4gy-mp-0) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396636       1 azure_backoff.go:408] CreateOrUpdateVMSS: verify the status of the vmss being created or updated
I0304 22:34:11.227305       1 request.go:581] Throttling request took 93.677696ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.277296       1 request.go:581] Throttling request took 143.693548ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:11.297925       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:11.327229       1 request.go:581] Throttling request took 193.6401ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.377219       1 request.go:581] Throttling request took 243.612051ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:11.427220       1 request.go:581] Throttling request took 293.590903ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.481596       1 request.go:581] Throttling request took 347.959059ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.483901       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:34:11.545429       1 request.go:581] Throttling request took 93.357196ms, request: GET:https://10.0.0.4:6443/apis/authorization.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.595368       1 request.go:581] Throttling request took 143.339048ms, request: GET:https://10.0.0.4:6443/apis/autoscaling/v1?timeout=32s
I0304 22:34:11.645422       1 request.go:581] Throttling request took 193.328399ms, request: GET:https://10.0.0.4:6443/apis/autoscaling/v2beta1?timeout=32s
I0304 22:34:11.695392       1 request.go:581] Throttling request took 243.324251ms, request: GET:https://10.0.0.4:6443/apis/batch/v1beta1?timeout=32s
I0304 22:34:11.745372       1 request.go:581] Throttling request took 293.275302ms, request: GET:https://10.0.0.4:6443/apis/batch/v1?timeout=32s
I0304 22:34:11.795929       1 request.go:581] Throttling request took 343.812255ms, request: GET:https://10.0.0.4:6443/apis/certificates.k8s.io/v1?timeout=32s
I0304 22:34:13.982410       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:13.982454       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:15.434049       1 gc_controller.go:164] GC'ing orphaned
I0304 22:34:15.434102       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:34:16.382825       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Secret total 54 items received
I0304 22:34:18.950251       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="104.7µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46934" resp=200
I0304 22:34:20.300097       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:20.399555       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:21.286623       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Service total 13 items received
I0304 22:34:21.330068       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:23.988349       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:23.988389       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:27.352644       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1beta1.CronJob total 0 items received
I0304 22:34:28.303279       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.DaemonSet total 36 items received
I0304 22:34:28.946473       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="84.9µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:47060" resp=200
I0304 22:34:30.262337       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Endpoints total 535 items received
I0304 22:34:30.383263       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ServiceAccount total 85 items received
I0304 22:34:31.353992       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:31.674743       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1beta1.RuntimeClass total 0 items received
I0304 22:34:33.310076       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Role total 0 items received
I0304 22:34:33.994675       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:33.994702       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:34.614499       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.MutatingWebhookConfiguration total 0 items received
I0304 22:34:35.290711       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:35.300309       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:35.399733       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:35.434233       1 gc_controller.go:164] GC'ing orphaned
I0304 22:34:35.434369       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:34:35.451777       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:37.270701       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.CSINode total 12 items received
I0304 22:34:38.945276       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="73.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:47160" resp=200
I0304 22:34:41.368216       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:41.584138       1 request.go:581] Throttling request took 94.567432ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.634195       1 request.go:581] Throttling request took 144.612349ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:41.684210       1 request.go:581] Throttling request took 194.608566ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.734138       1 request.go:581] Throttling request took 244.521783ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:41.784207       1 request.go:581] Throttling request took 293.781299ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.834274       1 request.go:581] Throttling request took 340.076615ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.836183       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:34:41.898650       1 request.go:581] Throttling request took 94.814332ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.948612       1 request.go:581] Throttling request took 144.770649ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.998612       1 request.go:581] Throttling request took 194.745866ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:42.048636       1 request.go:581] Throttling request took 244.751782ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:42.098648       1 request.go:581] Throttling request took 294.732999ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:42.149324       1 request.go:581] Throttling request took 345.393216ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:43.747457       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.IngressClass total 0 items received
I0304 22:34:44.001124       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:44.001153       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:46.944333       1 azure_vmss.go:1176] ensureVMSSInPool begins to update vmss(win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:46.944374       1 azure_backoff.go:408] CreateOrUpdateVMSS: verify the status of the vmss being created or updated
E0304 22:34:47.040394       1 azure_backoff.go:422] CreateOrUpdateVMSS: error CreateOrUpdate vmss(win-p-win): &{false 409 0001-01-01 00:00:00 +0000 UTC Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}}
E0304 22:34:47.040461       1 azure_vmss.go:1179] ensureVMSSInPool CreateOrUpdateVMSS(win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy, err: <nil>
E0304 22:34:47.040690       1 azure_loadbalancer.go:162] reconcileLoadBalancer(default/web-elb) failed: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}
I0304 22:34:47.040735       1 controller.go:789] Finished syncing service "default/web-elb" (1m11.319330706s)
E0304 22:34:47.040753       1 controller.go:275] error processing service default/web-elb (will retry): failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}

So the ILB service got created and deleted successfully but then the ELB service times out. It looks like the LB rules got added successfully but then the CreateOrUpdateVMSS loops on CreateOrUpdateVMSS: verify the status of the vmss being created or updated until eventually the test times out and starts cleaning up the cluster at which point it fails with ResourceGroupBeingDeleted (expected).

Looking at activity logs for the RG I don't see any PUT errors, but I do see a bunch of VMSS writes. @devigned does any of this speak to you?

devigned · 2021-03-05T17:54:33Z

Is it possible this is related to the VMSS extension being PUT repeatedly https://kubernetes.slack.com/archives/CEX9HENG7/p1614951175153700?thread_ts=1614873280.120900&cid=CEX9HENG7

CecileRobertMichon · 2021-03-05T18:00:15Z

Is it possible this is related to the VMSS extension being PUT repeatedly https://kubernetes.slack.com/archives/CEX9HENG7/p1614951175153700?thread_ts=1614873280.120900&cid=CEX9HENG7

Yes, I think it might be. I don't see how that would affect the LB reconcile though. The extension should be a no-op... I need to look into it.

CecileRobertMichon · 2021-03-08T21:48:20Z

Opened #1221 and #1217

/retest

nader-ziada · 2021-03-10T16:14:27Z

/test pull-cluster-api-provider-azure-e2e-windows

nader-ziada · 2021-03-15T14:22:35Z

/test pull-cluster-api-provider-azure-e2e-windows

nader-ziada · 2021-03-15T16:38:43Z

another try just to make sure

/test pull-cluster-api-provider-azure-e2e-windows

nader-ziada · 2021-03-15T18:20:50Z

/lgtm

nader-ziada · 2021-03-16T12:16:35Z

last run failed on the Azure authentication issue, should be fixed now

/test pull-cluster-api-provider-azure-e2e

nader-ziada · 2021-03-22T14:29:17Z

/test pull-cluster-api-provider-azure-e2e-windows

k8s-ci-robot requested review from alexeldeib and juan-lee February 16, 2021 19:08

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 16, 2021

devigned approved these changes Feb 17, 2021

View reviewed changes

k8s-ci-robot assigned devigned Feb 17, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2021

CecileRobertMichon force-pushed the re-enable-vmss-lb-test branch from 01423c2 to 19dc964 Compare February 22, 2021 22:18

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2021

devigned approved these changes Feb 22, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2021

CecileRobertMichon force-pushed the re-enable-vmss-lb-test branch from 19dc964 to d864363 Compare February 25, 2021 22:31

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 25, 2021

re-enable ILB test for VMSS

b999728

This reverts commit c0d8e36.

CecileRobertMichon force-pushed the re-enable-vmss-lb-test branch from d864363 to b999728 Compare March 4, 2021 21:23

CecileRobertMichon mentioned this pull request Mar 8, 2021

E2E failure: Ensuring web-elb times out with Windows + VMSS #1221

Closed

k8s-ci-robot assigned nader-ziada Mar 15, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2021

k8s-ci-robot merged commit d5124be into kubernetes-sigs:master Mar 22, 2021

k8s-ci-robot added this to the v0.5.0 milestone Mar 22, 2021

nader-ziada mentioned this pull request Mar 23, 2021

Revert "re-enable ILB test for VMSS" #1257

Closed

3 tasks

bridgetkromhout mentioned this pull request Mar 15, 2022

SyncLoadBalancerFailed error because service controller can't find load balancer that was deleted Azure/aks-engine#4197

Closed

CecileRobertMichon deleted the re-enable-vmss-lb-test branch February 17, 2023 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-enable ILB test for VMSS #1177

re-enable ILB test for VMSS #1177

CecileRobertMichon commented Feb 16, 2021

devigned left a comment

devigned commented Feb 17, 2021

devigned commented Feb 17, 2021

devigned commented Feb 19, 2021

CecileRobertMichon commented Feb 22, 2021

devigned commented Feb 22, 2021

CecileRobertMichon commented Feb 22, 2021

k8s-ci-robot commented Feb 22, 2021

CecileRobertMichon commented Feb 22, 2021

devigned left a comment

devigned commented Feb 22, 2021

CecileRobertMichon commented Feb 23, 2021

devigned commented Feb 23, 2021

devigned commented Feb 23, 2021

CecileRobertMichon commented Feb 25, 2021

CecileRobertMichon commented Feb 26, 2021

CecileRobertMichon commented Mar 2, 2021

CecileRobertMichon commented Mar 4, 2021

CecileRobertMichon commented Mar 4, 2021 •

edited

devigned commented Mar 4, 2021

CecileRobertMichon commented Mar 5, 2021

devigned commented Mar 5, 2021

CecileRobertMichon commented Mar 5, 2021

CecileRobertMichon commented Mar 8, 2021

nader-ziada commented Mar 10, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 16, 2021

nader-ziada commented Mar 22, 2021

re-enable ILB test for VMSS #1177

re-enable ILB test for VMSS #1177

Conversation

CecileRobertMichon commented Feb 16, 2021

devigned left a comment

Choose a reason for hiding this comment

devigned commented Feb 17, 2021

devigned commented Feb 17, 2021

devigned commented Feb 19, 2021

CecileRobertMichon commented Feb 22, 2021

devigned commented Feb 22, 2021

CecileRobertMichon commented Feb 22, 2021

k8s-ci-robot commented Feb 22, 2021

CecileRobertMichon commented Feb 22, 2021

devigned left a comment

Choose a reason for hiding this comment

devigned commented Feb 22, 2021

CecileRobertMichon commented Feb 23, 2021

devigned commented Feb 23, 2021

devigned commented Feb 23, 2021

CecileRobertMichon commented Feb 25, 2021

CecileRobertMichon commented Feb 26, 2021

CecileRobertMichon commented Mar 2, 2021

CecileRobertMichon commented Mar 4, 2021

CecileRobertMichon commented Mar 4, 2021 • edited

devigned commented Mar 4, 2021

CecileRobertMichon commented Mar 5, 2021

devigned commented Mar 5, 2021

CecileRobertMichon commented Mar 5, 2021

CecileRobertMichon commented Mar 8, 2021

nader-ziada commented Mar 10, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 15, 2021

nader-ziada commented Mar 16, 2021

nader-ziada commented Mar 22, 2021

CecileRobertMichon commented Mar 4, 2021 •

edited