[BUG] Duplicate LB allocation happens during upgrading #4329

yaocw2020 · 2023-07-28T07:21:44Z

Describe the bug

Harvester load balancer duplicate allocation happens during upgrading from Harvester v1.1.2 to Harvester v1.2.0.

2023-07-27T07:41:25.859926166Z time="2023-07-27T07:41:25Z" level=error msg="error syncing 'default/kubernetes-default-nginx-fa60cbe7': handler harvester-lb-controller: allocate ip for lb default/kubernetes-default-nginx-fa60cbe7 failed, error: 10.84.103.205 has been allocated to default/kubernetes-default-nginx-fa60cbe7, duplicate allocation is not allowed, requeuing"

To Reproduce
Steps to reproduce the behavior:

Spin up a Harvester v1.1.2.
Import Harvester into Rancher v2.6.x or v2.7.4
Spin up a guest RKE2 cluster with the Harvester cloud provider < v0.2.0 (use RKE2 version with either v1.23, < v1.24.15, < v1.25.11 or < v1.26.6).
Configure a VIP pool.
Create a load balancer service with the pool IPAM mode.
Upgrade Harvester v1.1.2 to Harvester v1.2.0-rc4.

Expected behavior

LB should get the original IP from pool without error during upgrading.

Support bundle

Environment

Harvester ISO version: v1.1.2 to v1.2.0-rc4
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):Baremetal with Dell PowerEdge R630

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

harvesterhci-io-github-bot · 2023-08-04T07:48:56Z

Pre Ready-For-Testing Checklist

* [ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
~~The HEP PR is at:~~

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at: [BUG] Duplicate LB allocation happens during upgrading #4329 (comment)

* [ ] Is there a workaround for the issue? If so, where is it documented?
~~The workaround is at:~~

Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
The PR is at: Record the allocated IPs to allocated history during upgrading load-balancer-harvester#18
- Does the PR include the explanation for the fix or the feature?
* [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
~~The PR for the YAML change is at:~~
~~The PR for the chart change is at:~~

* [ ] If labeled: area/ui Has the UI issue filed or ready to be merged?
~~The UI issue/PR is at:~~

* [ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
The documentation/KB PR is at:

* [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
~~- The automation skeleton PR is at:~~
~~- The automation test case PR is at:~~

* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
~~The compatibility issue is filed at:~~

irishgordo · 2023-08-09T23:59:15Z

@yaocw2020 I had a few small questions surrounding the test setup.

Validation question:

Can the instance of Rancher be a Docker based installation of Rancher for testing in conjunction with the ipxe-examples repo?

On v1.1.2, on an RKE2 Guest Cluster (single node vm on Harvester)- I'm running into an issue where the Load Balancer seems to be hung in a Pending state.
It was built during a Deployment creation of Nginx:Latest (on container-0) -> with the Ports, specifying Service Type of Load Balancer and IPAM set to Pool.

The VIP of the Harvester cluster is 192.168.0.131.
The vip-pools, that were configured under settings are:

{
  "default": "192.168.0.10-192.168.0.60"
}

And the Load Balancer does seem to be stuck with, Error syncing load balancer :

Name:                     k-loadbalancer                                                                                       
Namespace:                default                                                                                              
Labels:                   <none>                                                                                               
Annotations:              cloudprovider.harvesterhci.io/ipam: pool                                                             
                          field.cattle.io/targetWorkloadIds: ["default/k"]                                                     
                          management.cattle.io/ui-managed: true                                                                
Selector:                 workload.user.cattle.io/workloadselector=apps.deployment-default-k                                   
Type:                     LoadBalancer                                                                                         
IP Family Policy:         SingleStack                                                                                          
IP Families:              IPv4                                                                                                 
IP:                       10.43.181.164                                                                                        
IPs:                      10.43.181.164                                                                                        
Port:                     s  8081/TCP                                                                                          
TargetPort:               80/TCP                                                                                               
NodePort:                 s  32373/TCP                                                                                         
Endpoints:                10.42.179.152:80                                                                                     
Session Affinity:         None                                                                                                 
External Traffic Policy:  Cluster                                                                                              
Events:                                                                                                                        
  Type     Reason                  Age                    From                Message                                          
  ----     ------                  ----                   ----                -------                                          
  Normal   EnsuringLoadBalancer    2m15s (x7 over 7m30s)  service-controller  Ensuring load balancer                           
  Warning  SyncLoadBalancerFailed  2m15s (x7 over 7m30s)  service-controller  Error syncing load balancer: failed to ensure loa
d balancer: create or update lb default/kubernetes-default-k-loadbalancer-0a9d6792 failed, error: the server could not find the
 requested resource (post loadbalancers.meta.k8s.io)

irishgordo · 2023-08-10T19:17:14Z

Additionally here are some of the logs:

╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl get helmchartconfigs.helm.cattle.io -n kube-system harvester-cloud-provider -o yaml 
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4SRPU/DMBCG/wq6OanbNE2KJQbUBYmFATF1udiXxsQfkX0JQlX/O0oLEgylo2W/j5/37gg4mDeKyQQPEjqybqGQ2dLCBDGtIIPeeA0Snsi6XYeRd8G35gAZOGLUyAjyCOh9YGQTfJqPoXknxYl4EU34BTQzCbKr9+HDU8wPUw8S+nX6o5LdPRuvHx61Dv4mwqMjkODQ44F0rmbxXP2Y386mAdUM6MeG8vSZmBycMlCRziVfjaPE6AaQfrQ2A4sN2X+rd5g6kKDbulpWShcF1m19vyqrdYtYl6ttuVxvN6rcNEWltzj/9l2iwzhRYoq5smHU+RDDZDRFuLy4opoGUrPPhHaktAueyTNIOO7hjLms8QW524Pcg5gwCmsaEdGrjqKIPRWCWInL1PLWWEriHIXT6SsAAP//kKHxnTgCAAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: managed-chart-config
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2023-08-10T18:15:24Z"
  generation: 1
  labels:
    objectset.rio.cattle.io/hash: df7606cd22a7f791463faa741840385c45b26d8a
  name: harvester-cloud-provider
  namespace: kube-system
  resourceVersion: "303"
  uid: 71e85294-1cd6-4945-a2c3-4454ae76778b
spec:
  valuesContent: '{"cloudConfigPath":"/var/lib/rancher/rke2/etc/config-files/cloud-provider-config","clusterName":"b","global":{"cattle":{"clusterId":"c-m-6lqf54hp"}}}'
╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl get pods -n kube-system | grep -ie "cloud-provider"
harvester-cloud-provider-59947c686d-nfkkv               1/1     Running     0          29m
helm-install-harvester-cloud-provider-8r42l             0/1     Completed   0          29m
╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl logs -n kube-system harvester-cloud-provider-59947c686d-nfkkv --follow
I0810 18:16:36.155187       1 serving.go:348] Generated self-signed cert in-memory
W0810 18:16:36.330352       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0810 18:16:36.333381       1 main.go:85] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0810 18:16:36.333559       1 controllermanager.go:145] Version: v0.0.0-master+$Format:%H$
I0810 18:16:36.334561       1 secure_serving.go:210] Serving securely on [::]:10258
I0810 18:16:36.334660       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0810 18:16:36.334983       1 leaderelection.go:248] attempting to acquire leader lease kube-system/cloud-controller-manager...
I0810 18:16:36.345946       1 leaderelection.go:258] successfully acquired lease kube-system/cloud-controller-manager
I0810 18:16:36.346986       1 event.go:294] "Event occurred" object="kube-system/cloud-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="b-pool1-24e22b1d-lq5mg_ad86693d-4e93-4e90-b4d0-d2d14cbb01e2 became leader"
W0810 18:16:36.655251       1 core.go:111] --configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes.
W0810 18:16:36.655568       1 controllermanager.go:289] Skipping "route"
I0810 18:16:36.655883       1 controllermanager.go:301] Started "cloud-node"
I0810 18:16:36.655969       1 controllermanager.go:301] Started "cloud-node-lifecycle"
I0810 18:16:36.656137       1 controllermanager.go:301] Started "service"
I0810 18:16:36.656437       1 node_controller.go:157] Sending events to api server.
I0810 18:16:36.656604       1 node_controller.go:166] Waiting for informer caches to sync
I0810 18:16:36.656686       1 node_lifecycle_controller.go:113] Sending events to api server
I0810 18:16:36.656823       1 controller.go:237] Starting service controller
I0810 18:16:36.656897       1 shared_informer.go:252] Waiting for caches to sync for service
I0810 18:16:36.757235       1 shared_informer.go:259] Caches are synced for service
I0810 18:16:36.757282       1 node_controller.go:415] Initializing node b-pool1-24e22b1d-lq5mg with cloud provider
I0810 18:16:36.796419       1 node_controller.go:484] Successfully initialized node b-pool1-24e22b1d-lq5mg with cloud provider
I0810 18:16:36.796905       1 event.go:294] "Event occurred" object="b-pool1-24e22b1d-lq5mg" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
time="2023-08-10T18:16:36Z" level=info msg="Starting kubevirt.io/v1, Kind=VirtualMachineInstance controller"
time="2023-08-10T18:16:36Z" level=info msg="Starting /v1, Kind=Node controller"
time="2023-08-10T18:16:36Z" level=info msg="Starting /v1, Kind=Service controller"
I0810 18:31:02.716076       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0810 18:31:02.787273       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
E0810 18:31:02.788078       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:07.789727       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:07.793597       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:07.793636       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:31:17.799480       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:17.815807       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:17.816635       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:31:37.819161       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:37.820666       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:37.820709       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:32:17.825060       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:32:17.828485       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:32:17.828665       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:33:37.833933       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:33:37.839629       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:33:37.839926       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:36:17.841324       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:36:17.854856       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:36:17.854901       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:41:17.855371       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:41:17.859136       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:41:17.859304       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"

irishgordo · 2023-08-10T20:32:12Z

@yaocw2020 I've opened up:
#4401

To highlight the steps used while encountering the validation issue

guangbochen · 2023-08-15T06:31:44Z

@irishgordo Harvester v1.1.2 will only support the RKE2 versions with CCM < v0.2.0; so for the upgrade test, we need to use older rke2 versions which are compatible with Harvester v1.1.2, I have updated the reproduce steps can u PTAL again, thanks.

irishgordo · 2023-08-15T19:15:47Z

Tested with:

RKE2 Kubernetes Version: v1.23.17 +rke2r1
Rancher v2.7.4
Harvester v1.1.2 -> v1.2.0-rc5

And this looks good 😄 👍

I'll go ahead and close this out @yaocw2020

yaocw2020 added kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one area/load-balancer labels Jul 28, 2023

yaocw2020 self-assigned this Jul 28, 2023

yaocw2020 added this to the v1.2.0 milestone Jul 28, 2023

yaocw2020 mentioned this issue Jul 29, 2023

Record the allocated IPs to allocated history during upgrading harvester/load-balancer-harvester#18

Merged

guangbochen added priority/1 Highly recommended to fix in this release not-require/test-plan Skip to create a e2e automation test issue required-for-rc/v1.2.0 labels Jul 30, 2023

guangbochen changed the title ~~[BUG] Duplicate allocation happens during upgrading~~ [BUG] Duplicate LB allocation happens during upgrading Jul 30, 2023

guangbochen mentioned this issue Jul 31, 2023

Harvester cloud provider enhancement #2134

Closed

8 tasks

irishgordo self-assigned this Aug 8, 2023

irishgordo assigned irishgordo and unassigned irishgordo Aug 10, 2023

irishgordo self-assigned this Aug 15, 2023

irishgordo closed this as completed Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Duplicate LB allocation happens during upgrading #4329

[BUG] Duplicate LB allocation happens during upgrading #4329

yaocw2020 commented Jul 28, 2023 •

edited by guangbochen

harvesterhci-io-github-bot commented Aug 4, 2023 •

edited by yaocw2020

irishgordo commented Aug 9, 2023

irishgordo commented Aug 10, 2023

irishgordo commented Aug 10, 2023

guangbochen commented Aug 15, 2023

irishgordo commented Aug 15, 2023

[BUG] Duplicate LB allocation happens during upgrading #4329

[BUG] Duplicate LB allocation happens during upgrading #4329

Comments

yaocw2020 commented Jul 28, 2023 • edited by guangbochen

harvesterhci-io-github-bot commented Aug 4, 2023 • edited by yaocw2020

Pre Ready-For-Testing Checklist

irishgordo commented Aug 9, 2023

irishgordo commented Aug 10, 2023

irishgordo commented Aug 10, 2023

guangbochen commented Aug 15, 2023

irishgordo commented Aug 15, 2023

yaocw2020 commented Jul 28, 2023 •

edited by guangbochen

harvesterhci-io-github-bot commented Aug 4, 2023 •

edited by yaocw2020