Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-enable ILB test for VMSS #1177

Merged

Conversation

CecileRobertMichon
Copy link
Contributor

This reverts commit c0d8e36.

What type of PR is this?
/kind failing-test

What this PR does / why we need it: Now that the root issue should be fixed in Azure, re-enable the test.

/hold until fix is rolled out to all regions

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1115

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 16, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 16, 2021
Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2021
@devigned
Copy link
Contributor

/retest

2 similar comments
@devigned
Copy link
Contributor

/retest

@devigned
Copy link
Contributor

/retest

@CecileRobertMichon
Copy link
Contributor Author

fix is now in all public cloud regions

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2021
@devigned
Copy link
Contributor

/retest

@CecileRobertMichon
Copy link
Contributor Author

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2021
@CecileRobertMichon
Copy link
Contributor Author

rebased on top of #1186

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2021
@devigned
Copy link
Contributor

/retest

2 similar comments
@CecileRobertMichon
Copy link
Contributor Author

/retest

@devigned
Copy link
Contributor

/retest

@devigned
Copy link
Contributor

/retest
IPv6 fail

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2021
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 25, 2021
@CecileRobertMichon
Copy link
Contributor Author

/retest

2 similar comments
@CecileRobertMichon
Copy link
Contributor Author

/retest

@CecileRobertMichon
Copy link
Contributor Author

/retest

This reverts commit c0d8e36.
@CecileRobertMichon
Copy link
Contributor Author

rebased on top of #1197

@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 4, 2021

/retest

looking into the failure separately

Timed out after 300.114s.
Deployment default/web-windows failed

@devigned
Copy link
Contributor

devigned commented Mar 4, 2021

@CecileRobertMichon it looks like Windows VMSS is having problems with the LB, but it does look like the cluster provisioned successfully.

@CecileRobertMichon
Copy link
Contributor Author

Digging into the error, here are the relevant logs:

I0304 22:33:35.721937       1 event.go:291] "Event occurred" object="default/web-elb" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0304 22:33:35.742696       1 azure_loadbalancer.go:677] reconcileLoadBalancer for service(default/web-elb) - wantLb(true): started
I0304 22:33:35.743157       1 endpoints_controller.go:349] Finished syncing service "default/web-elb" endpoints. (223.8µs)
I0304 22:33:35.743365       1 endpointslice_controller.go:297] Finished syncing service "default/web-elb" endpoint slices. (167.5µs)
I0304 22:33:35.749377       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=endpoints, namespace default, name web-ilb, uid 4ed7f798-645f-4c28-8f7b-77463ac819f7, event type delete
I0304 22:33:35.749427       1 endpointslicemirroring_controller.go:273] syncEndpoints("default/web-ilb")
I0304 22:33:35.749450       1 endpointslicemirroring_controller.go:283] default/web-ilb Endpoints not found, cleaning up any mirrored EndpointSlices
I0304 22:33:35.749471       1 endpointslicemirroring_controller.go:270] Finished syncing EndpointSlices for "default/web-ilb" Endpoints. (45.7µs)
I0304 22:33:35.751497       1 endpoints_controller.go:349] Finished syncing service "default/web-ilb" endpoints. (35.989983ms)
I0304 22:33:35.754035       1 endpoints_controller.go:349] Finished syncing service "default/web-ilb" endpoints. (2.475305ms)
I0304 22:33:35.810087       1 azure_backoff.go:251] LoadBalancerClient.List(capz-e2e-v1z4gy) success
I0304 22:33:35.810124       1 azure_loadbalancer.go:685] reconcileLoadBalancer for service(default/web-elb): lb(capz-e2e-v1z4gy/capz-e2e-v1z4gy) wantLb(true) resolved load balancer name
I0304 22:33:35.817136       1 request.go:581] Throttling request took 85.396697ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1?timeout=32s
I0304 22:33:35.832682       1 azure_wrap.go:109] Public IP "capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd" not found
I0304 22:33:35.832762       1 azure_loadbalancer.go:542] ensurePublicIPExists for service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating
I0304 22:33:35.832851       1 azure_loadbalancer.go:568] service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating as ipv4 for clusterIP:10.102.207.34
I0304 22:33:35.832867       1 azure_loadbalancer.go:571] ensurePublicIPExists for service(default/web-elb): pip(capz-e2e-v1z4gy-aea7219be404141198eff19df26913bd) - creating
I0304 22:33:35.867104       1 request.go:581] Throttling request took 135.338112ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:35.917155       1 request.go:581] Throttling request took 185.373827ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:35.967130       1 request.go:581] Throttling request took 235.300842ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:36.018647       1 request.go:581] Throttling request took 286.80146ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:36.067043       1 request.go:581] Throttling request took 335.180572ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:36.072926       1 garbagecollector.go:519] "Deleting object" object="default/web-ilb-mttnz" objectUID=7f1359d1-a7e6-4658-9ed6-8f2d1f8bca36 kind="EndpointSlice" propagationPolicy=Background
I0304 22:33:36.079710       1 resource_quota_monitor.go:354] QuotaMonitor process object: discovery.k8s.io/v1beta1, Resource=endpointslices, namespace default, name web-ilb-mttnz, uid 7f1359d1-a7e6-4658-9ed6-8f2d1f8bca36, event type delete
I0304 22:33:38.515205       1 azure_loadbalancer.go:1033] reconcileLoadBalancerRule lb name (capz-e2e-v1z4gy) rule name (aea7219be404141198eff19df26913bd-TCP-80)
I0304 22:33:38.515237       1 azure_loadbalancer.go:1033] reconcileLoadBalancerRule lb name (capz-e2e-v1z4gy) rule name (aea7219be404141198eff19df26913bd-TCP-443)
I0304 22:33:38.515265       1 azure_loadbalancer.go:956] reconcileLoadBalancer: reconcileLoadBalancer for service(default/web-elb): lb(capz-e2e-v1z4gy) - updating
I0304 22:33:38.946461       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="75.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46592" resp=200
I0304 22:33:40.875327       1 request.go:581] Throttling request took 93.734308ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:40.925399       1 request.go:581] Throttling request took 143.785712ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:40.975341       1 request.go:581] Throttling request took 193.714916ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.025377       1 request.go:581] Throttling request took 239.143919ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.075373       1 request.go:581] Throttling request took 289.121422ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:41.125324       1 request.go:581] Throttling request took 339.067226ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.126962       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:33:41.193451       1 request.go:581] Throttling request took 90.833707ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:33:41.243417       1 request.go:581] Throttling request took 140.783811ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.254298       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:33:41.293537       1 request.go:581] Throttling request took 190.880215ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:33:41.343440       1 request.go:581] Throttling request took 240.767819ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.393447       1 request.go:581] Throttling request took 290.735422ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:33:41.443432       1 request.go:581] Throttling request took 340.684826ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:33:43.960453       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:33:43.960481       1 cronjob_controller.go:134] Found 0 groups
I0304 22:33:45.272480       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PodTemplate total 0 items received
I0304 22:33:48.845428       1 azure_vmss.go:931] ensuring node "capz-e2e-v1z4gy-mp-0000001" of scaleset "" in LB backendpool "/subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy"
I0304 22:33:48.845482       1 azure_vmss.go:184] Couldn't find VMSS VM with nodeName capz-e2e-v1z4gy-mp-0000001, refreshing the cache
I0304 22:33:48.935628       1 azure_vmss.go:931] ensuring node "win-p-win000001" of scaleset "" in LB backendpool "/subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy"
I0304 22:33:48.935679       1 azure_vmss.go:184] Couldn't find VMSS VM with nodeName win-p-win000001, refreshing the cache
I0304 22:33:48.947432       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="111.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46668" resp=200
I0304 22:33:48.988815       1 azure_vmss.go:1260] EnsureHostInPool begins to UpdateVMs for VMSS(capz-e2e-v1z4gy, win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:33:48.989407       1 azure_vmss.go:1260] EnsureHostInPool begins to UpdateVMs for VMSS(capz-e2e-v1z4gy, capz-e2e-v1z4gy-mp-0) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:33:49.394061       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ReplicaSet total 18 items received
I0304 22:33:50.299613       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:33:50.361429       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.CSIDriver total 0 items received
I0304 22:33:50.399076       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:33:51.268968       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:33:53.967027       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:33:53.967054       1 cronjob_controller.go:134] Found 0 groups
I0304 22:33:55.433825       1 gc_controller.go:164] GC'ing orphaned
I0304 22:33:55.433860       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:33:58.947128       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="110.3µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46754" resp=200
I0304 22:34:01.283564       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:03.974929       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:03.975034       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:05.290504       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:05.299889       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:05.399251       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:05.451575       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:08.948899       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="84.701µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46838" resp=200
I0304 22:34:09.396277       1 azure_vmss.go:1052] ensureVMSSInPool: ensuring VMSS with backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396414       1 azure_vmss.go:1082] ensureVMSSInPool begins to update VMSS map[capz-e2e-v1z4gy-mp-0:true win-p-win:true] with backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396553       1 azure_vmss.go:1176] ensureVMSSInPool begins to update vmss(capz-e2e-v1z4gy-mp-0) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:09.396636       1 azure_backoff.go:408] CreateOrUpdateVMSS: verify the status of the vmss being created or updated
I0304 22:34:11.227305       1 request.go:581] Throttling request took 93.677696ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.277296       1 request.go:581] Throttling request took 143.693548ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:11.297925       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:11.327229       1 request.go:581] Throttling request took 193.6401ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.377219       1 request.go:581] Throttling request took 243.612051ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:11.427220       1 request.go:581] Throttling request took 293.590903ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.481596       1 request.go:581] Throttling request took 347.959059ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.483901       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:34:11.545429       1 request.go:581] Throttling request took 93.357196ms, request: GET:https://10.0.0.4:6443/apis/authorization.k8s.io/v1beta1?timeout=32s
I0304 22:34:11.595368       1 request.go:581] Throttling request took 143.339048ms, request: GET:https://10.0.0.4:6443/apis/autoscaling/v1?timeout=32s
I0304 22:34:11.645422       1 request.go:581] Throttling request took 193.328399ms, request: GET:https://10.0.0.4:6443/apis/autoscaling/v2beta1?timeout=32s
I0304 22:34:11.695392       1 request.go:581] Throttling request took 243.324251ms, request: GET:https://10.0.0.4:6443/apis/batch/v1beta1?timeout=32s
I0304 22:34:11.745372       1 request.go:581] Throttling request took 293.275302ms, request: GET:https://10.0.0.4:6443/apis/batch/v1?timeout=32s
I0304 22:34:11.795929       1 request.go:581] Throttling request took 343.812255ms, request: GET:https://10.0.0.4:6443/apis/certificates.k8s.io/v1?timeout=32s
I0304 22:34:13.982410       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:13.982454       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:15.434049       1 gc_controller.go:164] GC'ing orphaned
I0304 22:34:15.434102       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:34:16.382825       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Secret total 54 items received
I0304 22:34:18.950251       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="104.7µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:46934" resp=200
I0304 22:34:20.300097       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:20.399555       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:21.286623       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Service total 13 items received
I0304 22:34:21.330068       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:23.988349       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:23.988389       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:27.352644       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1beta1.CronJob total 0 items received
I0304 22:34:28.303279       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.DaemonSet total 36 items received
I0304 22:34:28.946473       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="84.9µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:47060" resp=200
I0304 22:34:30.262337       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Endpoints total 535 items received
I0304 22:34:30.383263       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ServiceAccount total 85 items received
I0304 22:34:31.353992       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:31.674743       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1beta1.RuntimeClass total 0 items received
I0304 22:34:33.310076       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Role total 0 items received
I0304 22:34:33.994675       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:33.994702       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:34.614499       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.MutatingWebhookConfiguration total 0 items received
I0304 22:34:35.290711       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:35.300309       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:35.399733       1 pv_controller_base.go:523] resyncing PV controller
I0304 22:34:35.434233       1 gc_controller.go:164] GC'ing orphaned
I0304 22:34:35.434369       1 gc_controller.go:227] GC'ing unscheduled pods which are terminating.
I0304 22:34:35.451777       1 reflector.go:369] k8s.io/client-go/informers/factory.go:134: forcing resync
I0304 22:34:37.270701       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.CSINode total 12 items received
I0304 22:34:38.945276       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="73.5µs" userAgent="kube-probe/1.19" srcIP="127.0.0.1:47160" resp=200
I0304 22:34:41.368216       1 node_lifecycle_controller.go:697] Node capz-e2e-v1z4gy-control-plane-k6sk5 was in a taint queue, but it's ready now. Ignoring taint request.
I0304 22:34:41.584138       1 request.go:581] Throttling request took 94.567432ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.634195       1 request.go:581] Throttling request took 144.612349ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:41.684210       1 request.go:581] Throttling request took 194.608566ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.734138       1 request.go:581] Throttling request took 244.521783ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:41.784207       1 request.go:581] Throttling request took 293.781299ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.834274       1 request.go:581] Throttling request took 340.076615ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.836183       1 resource_quota_controller.go:423] no resource updates from discovery, skipping resource quota sync
I0304 22:34:41.898650       1 request.go:581] Throttling request took 94.814332ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.948612       1 request.go:581] Throttling request took 144.770649ms, request: GET:https://10.0.0.4:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
I0304 22:34:41.998612       1 request.go:581] Throttling request took 194.745866ms, request: GET:https://10.0.0.4:6443/apis/scheduling.k8s.io/v1?timeout=32s
I0304 22:34:42.048636       1 request.go:581] Throttling request took 244.751782ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0304 22:34:42.098648       1 request.go:581] Throttling request took 294.732999ms, request: GET:https://10.0.0.4:6443/apis/coordination.k8s.io/v1?timeout=32s
I0304 22:34:42.149324       1 request.go:581] Throttling request took 345.393216ms, request: GET:https://10.0.0.4:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0304 22:34:43.747457       1 reflector.go:515] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.IngressClass total 0 items received
I0304 22:34:44.001124       1 cronjob_controller.go:128] Found 0 jobs
I0304 22:34:44.001153       1 cronjob_controller.go:134] Found 0 groups
I0304 22:34:46.944333       1 azure_vmss.go:1176] ensureVMSSInPool begins to update vmss(win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy
I0304 22:34:46.944374       1 azure_backoff.go:408] CreateOrUpdateVMSS: verify the status of the vmss being created or updated
E0304 22:34:47.040394       1 azure_backoff.go:422] CreateOrUpdateVMSS: error CreateOrUpdate vmss(win-p-win): &{false 409 0001-01-01 00:00:00 +0000 UTC Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}}
E0304 22:34:47.040461       1 azure_vmss.go:1179] ensureVMSSInPool CreateOrUpdateVMSS(win-p-win) with new backendPoolID /subscriptions/===REDACTED===/resourceGroups/capz-e2e-v1z4gy/providers/Microsoft.Network/loadBalancers/capz-e2e-v1z4gy/backendAddressPools/capz-e2e-v1z4gy, err: <nil>
E0304 22:34:47.040690       1 azure_loadbalancer.go:162] reconcileLoadBalancer(default/web-elb) failed: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}
I0304 22:34:47.040735       1 controller.go:789] Finished syncing service "default/web-elb" (1m11.319330706s)
E0304 22:34:47.040753       1 controller.go:275] error processing service default/web-elb (will retry): failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {"error":{"code":"ResourceGroupBeingDeleted","message":"The resource group 'capz-e2e-v1z4gy' is in deprovisioning state and cannot perform this operation."}}

So the ILB service got created and deleted successfully but then the ELB service times out. It looks like the LB rules got added successfully but then the CreateOrUpdateVMSS loops on CreateOrUpdateVMSS: verify the status of the vmss being created or updated until eventually the test times out and starts cleaning up the cluster at which point it fails with ResourceGroupBeingDeleted (expected).

Looking at activity logs for the RG I don't see any PUT errors, but I do see a bunch of VMSS writes. @devigned does any of this speak to you?

@devigned
Copy link
Contributor

devigned commented Mar 5, 2021

Is it possible this is related to the VMSS extension being PUT repeatedly https://kubernetes.slack.com/archives/CEX9HENG7/p1614951175153700?thread_ts=1614873280.120900&cid=CEX9HENG7

@CecileRobertMichon
Copy link
Contributor Author

Is it possible this is related to the VMSS extension being PUT repeatedly https://kubernetes.slack.com/archives/CEX9HENG7/p1614951175153700?thread_ts=1614873280.120900&cid=CEX9HENG7

Yes, I think it might be. I don't see how that would affect the LB reconcile though. The extension should be a no-op... I need to look into it.

@CecileRobertMichon
Copy link
Contributor Author

Opened #1221 and #1217

/retest

@nader-ziada
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-windows

1 similar comment
@nader-ziada
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-windows

@nader-ziada
Copy link
Contributor

another try just to make sure

/test pull-cluster-api-provider-azure-e2e-windows

@nader-ziada
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2021
@nader-ziada
Copy link
Contributor

last run failed on the Azure authentication issue, should be fixed now

/test pull-cluster-api-provider-azure-e2e

@nader-ziada
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2E failure: LB GET failed after 5 attempts
4 participants