Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Duplicate LB allocation happens during upgrading #4329

Closed
yaocw2020 opened this issue Jul 28, 2023 · 6 comments
Closed

[BUG] Duplicate LB allocation happens during upgrading #4329

yaocw2020 opened this issue Jul 28, 2023 · 6 comments
Assignees
Labels
area/load-balancer kind/bug Issues that are defects reported by users or that we know have reached a real release not-require/test-plan Skip to create a e2e automation test issue priority/1 Highly recommended to fix in this release reproduce/always Reproducible 100% of the time severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Milestone

Comments

@yaocw2020
Copy link
Contributor

yaocw2020 commented Jul 28, 2023

Describe the bug

Harvester load balancer duplicate allocation happens during upgrading from Harvester v1.1.2 to Harvester v1.2.0.

2023-07-27T07:41:25.859926166Z time="2023-07-27T07:41:25Z" level=error msg="error syncing 'default/kubernetes-default-nginx-fa60cbe7': handler harvester-lb-controller: allocate ip for lb default/kubernetes-default-nginx-fa60cbe7 failed, error: 10.84.103.205 has been allocated to default/kubernetes-default-nginx-fa60cbe7, duplicate allocation is not allowed, requeuing"

To Reproduce
Steps to reproduce the behavior:

  1. Spin up a Harvester v1.1.2.
  2. Import Harvester into Rancher v2.6.x or v2.7.4
  3. Spin up a guest RKE2 cluster with the Harvester cloud provider < v0.2.0 (use RKE2 version with either v1.23, < v1.24.15, < v1.25.11 or < v1.26.6).
    image
  4. Configure a VIP pool.
  5. Create a load balancer service with the pool IPAM mode.
  6. Upgrade Harvester v1.1.2 to Harvester v1.2.0-rc4.

Expected behavior

LB should get the original IP from pool without error during upgrading.

Support bundle

Environment

  • Harvester ISO version: v1.1.2 to v1.2.0-rc4
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):Baremetal with Dell PowerEdge R630

Additional context
Add any other context about the problem here.

@yaocw2020 yaocw2020 added kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one area/load-balancer labels Jul 28, 2023
@yaocw2020 yaocw2020 self-assigned this Jul 28, 2023
@yaocw2020 yaocw2020 added this to the v1.2.0 milestone Jul 28, 2023
@guangbochen guangbochen added priority/1 Highly recommended to fix in this release not-require/test-plan Skip to create a e2e automation test issue required-for-rc/v1.2.0 labels Jul 30, 2023
@guangbochen guangbochen changed the title [BUG] Duplicate allocation happens during upgrading [BUG] Duplicate LB allocation happens during upgrading Jul 30, 2023
@guangbochen guangbochen added reproduce/always Reproducible 100% of the time severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) and removed reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Jul 30, 2023
@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Aug 4, 2023

Pre Ready-For-Testing Checklist

* [ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
The HEP PR is at:

* [ ] Is there a workaround for the issue? If so, where is it documented?
The workaround is at:

  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at: Record the allocated IPs to allocated history during upgrading load-balancer-harvester#18

    • Does the PR include the explanation for the fix or the feature?

    * [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

* [ ] If labeled: area/ui Has the UI issue filed or ready to be merged?
The UI issue/PR is at:

* [ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
The documentation/KB PR is at:

* [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
- The automation skeleton PR is at:
- The automation test case PR is at:

* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at:

@irishgordo irishgordo self-assigned this Aug 8, 2023
@irishgordo
Copy link

@yaocw2020 I had a few small questions surrounding the test setup.

Validation question:

Can the instance of Rancher be a Docker based installation of Rancher for testing in conjunction with the ipxe-examples repo?

On v1.1.2, on an RKE2 Guest Cluster (single node vm on Harvester)- I'm running into an issue where the Load Balancer seems to be hung in a Pending state.
It was built during a Deployment creation of Nginx:Latest (on container-0) -> with the Ports, specifying Service Type of Load Balancer and IPAM set to Pool.

The VIP of the Harvester cluster is 192.168.0.131.
The vip-pools, that were configured under settings are:

{
  "default": "192.168.0.10-192.168.0.60"
}

And the Load Balancer does seem to be stuck with, Error syncing load balancer :

Name:                     k-loadbalancer                                                                                       
Namespace:                default                                                                                              
Labels:                   <none>                                                                                               
Annotations:              cloudprovider.harvesterhci.io/ipam: pool                                                             
                          field.cattle.io/targetWorkloadIds: ["default/k"]                                                     
                          management.cattle.io/ui-managed: true                                                                
Selector:                 workload.user.cattle.io/workloadselector=apps.deployment-default-k                                   
Type:                     LoadBalancer                                                                                         
IP Family Policy:         SingleStack                                                                                          
IP Families:              IPv4                                                                                                 
IP:                       10.43.181.164                                                                                        
IPs:                      10.43.181.164                                                                                        
Port:                     s  8081/TCP                                                                                          
TargetPort:               80/TCP                                                                                               
NodePort:                 s  32373/TCP                                                                                         
Endpoints:                10.42.179.152:80                                                                                     
Session Affinity:         None                                                                                                 
External Traffic Policy:  Cluster                                                                                              
Events:                                                                                                                        
  Type     Reason                  Age                    From                Message                                          
  ----     ------                  ----                   ----                -------                                          
  Normal   EnsuringLoadBalancer    2m15s (x7 over 7m30s)  service-controller  Ensuring load balancer                           
  Warning  SyncLoadBalancerFailed  2m15s (x7 over 7m30s)  service-controller  Error syncing load balancer: failed to ensure loa
d balancer: create or update lb default/kubernetes-default-k-loadbalancer-0a9d6792 failed, error: the server could not find the
 requested resource (post loadbalancers.meta.k8s.io)   

@irishgordo
Copy link

Additionally here are some of the logs:

╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl get helmchartconfigs.helm.cattle.io -n kube-system harvester-cloud-provider -o yaml 
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4SRPU/DMBCG/wq6OanbNE2KJQbUBYmFATF1udiXxsQfkX0JQlX/O0oLEgylo2W/j5/37gg4mDeKyQQPEjqybqGQ2dLCBDGtIIPeeA0Snsi6XYeRd8G35gAZOGLUyAjyCOh9YGQTfJqPoXknxYl4EU34BTQzCbKr9+HDU8wPUw8S+nX6o5LdPRuvHx61Dv4mwqMjkODQ44F0rmbxXP2Y386mAdUM6MeG8vSZmBycMlCRziVfjaPE6AaQfrQ2A4sN2X+rd5g6kKDbulpWShcF1m19vyqrdYtYl6ttuVxvN6rcNEWltzj/9l2iwzhRYoq5smHU+RDDZDRFuLy4opoGUrPPhHaktAueyTNIOO7hjLms8QW524Pcg5gwCmsaEdGrjqKIPRWCWInL1PLWWEriHIXT6SsAAP//kKHxnTgCAAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: managed-chart-config
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2023-08-10T18:15:24Z"
  generation: 1
  labels:
    objectset.rio.cattle.io/hash: df7606cd22a7f791463faa741840385c45b26d8a
  name: harvester-cloud-provider
  namespace: kube-system
  resourceVersion: "303"
  uid: 71e85294-1cd6-4945-a2c3-4454ae76778b
spec:
  valuesContent: '{"cloudConfigPath":"/var/lib/rancher/rke2/etc/config-files/cloud-provider-config","clusterName":"b","global":{"cattle":{"clusterId":"c-m-6lqf54hp"}}}'
╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl get pods -n kube-system | grep -ie "cloud-provider"
harvester-cloud-provider-59947c686d-nfkkv               1/1     Running     0          29m
helm-install-harvester-cloud-provider-8r42l             0/1     Completed   0          29m
╭─mike at suse-workstation-team-harvester in ~
╰─○ kubectl logs -n kube-system harvester-cloud-provider-59947c686d-nfkkv --follow
I0810 18:16:36.155187       1 serving.go:348] Generated self-signed cert in-memory
W0810 18:16:36.330352       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0810 18:16:36.333381       1 main.go:85] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0810 18:16:36.333559       1 controllermanager.go:145] Version: v0.0.0-master+$Format:%H$
I0810 18:16:36.334561       1 secure_serving.go:210] Serving securely on [::]:10258
I0810 18:16:36.334660       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0810 18:16:36.334983       1 leaderelection.go:248] attempting to acquire leader lease kube-system/cloud-controller-manager...
I0810 18:16:36.345946       1 leaderelection.go:258] successfully acquired lease kube-system/cloud-controller-manager
I0810 18:16:36.346986       1 event.go:294] "Event occurred" object="kube-system/cloud-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="b-pool1-24e22b1d-lq5mg_ad86693d-4e93-4e90-b4d0-d2d14cbb01e2 became leader"
W0810 18:16:36.655251       1 core.go:111] --configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes.
W0810 18:16:36.655568       1 controllermanager.go:289] Skipping "route"
I0810 18:16:36.655883       1 controllermanager.go:301] Started "cloud-node"
I0810 18:16:36.655969       1 controllermanager.go:301] Started "cloud-node-lifecycle"
I0810 18:16:36.656137       1 controllermanager.go:301] Started "service"
I0810 18:16:36.656437       1 node_controller.go:157] Sending events to api server.
I0810 18:16:36.656604       1 node_controller.go:166] Waiting for informer caches to sync
I0810 18:16:36.656686       1 node_lifecycle_controller.go:113] Sending events to api server
I0810 18:16:36.656823       1 controller.go:237] Starting service controller
I0810 18:16:36.656897       1 shared_informer.go:252] Waiting for caches to sync for service
I0810 18:16:36.757235       1 shared_informer.go:259] Caches are synced for service
I0810 18:16:36.757282       1 node_controller.go:415] Initializing node b-pool1-24e22b1d-lq5mg with cloud provider
I0810 18:16:36.796419       1 node_controller.go:484] Successfully initialized node b-pool1-24e22b1d-lq5mg with cloud provider
I0810 18:16:36.796905       1 event.go:294] "Event occurred" object="b-pool1-24e22b1d-lq5mg" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
time="2023-08-10T18:16:36Z" level=info msg="Starting kubevirt.io/v1, Kind=VirtualMachineInstance controller"
time="2023-08-10T18:16:36Z" level=info msg="Starting /v1, Kind=Node controller"
time="2023-08-10T18:16:36Z" level=info msg="Starting /v1, Kind=Service controller"
I0810 18:31:02.716076       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0810 18:31:02.787273       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
E0810 18:31:02.788078       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:07.789727       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:07.793597       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:07.793636       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:31:17.799480       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:17.815807       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:17.816635       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:31:37.819161       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:31:37.820666       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:31:37.820709       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:32:17.825060       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:32:17.828485       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:32:17.828665       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:33:37.833933       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:33:37.839629       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:33:37.839926       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:36:17.841324       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:36:17.854856       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:36:17.854901       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"
I0810 18:41:17.855371       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0810 18:41:17.859136       1 controller.go:320] error processing service default/t-loadbalancer (will retry): failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)
I0810 18:41:17.859304       1 event.go:294] "Event occurred" object="default/t-loadbalancer" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: create or update lb test1/kubernetes-default-t-loadbalancer-9c0294f6 failed, error: the server could not find the requested resource (post loadbalancers.meta.k8s.io)"

@irishgordo
Copy link

@yaocw2020 I've opened up:
#4401

To highlight the steps used while encountering the validation issue

@irishgordo irishgordo assigned irishgordo and unassigned irishgordo Aug 10, 2023
@guangbochen
Copy link
Contributor

@irishgordo Harvester v1.1.2 will only support the RKE2 versions with CCM < v0.2.0; so for the upgrade test, we need to use older rke2 versions which are compatible with Harvester v1.1.2, I have updated the reproduce steps can u PTAL again, thanks.

@irishgordo irishgordo self-assigned this Aug 15, 2023
@irishgordo
Copy link

Tested with:

  • RKE2 Kubernetes Version: v1.23.17 +rke2r1
  • Rancher v2.7.4
  • Harvester v1.1.2 -> v1.2.0-rc5

And this looks good 😄 👍

I'll go ahead and close this out @yaocw2020

Screenshot from 2023-08-15 12-13-15
Screenshot from 2023-08-15 11-26-01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/load-balancer kind/bug Issues that are defects reported by users or that we know have reached a real release not-require/test-plan Skip to create a e2e automation test issue priority/1 Highly recommended to fix in this release reproduce/always Reproducible 100% of the time severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Projects
None yet
Development

No branches or pull requests

4 participants