Fixing the nodebalancer config rebuilds to include ids of preexisting nodebalancer nodes #192

akaokunc · 2024-03-14T19:04:42Z

Currently nodebalancer configs are rebuilt when nodes change, but that causes nodebalancer to reload and that can have some impact on performance. For nodes which didn't have changed this is not necessary, but we are not sending node ids. This PR bumps linodego to 1.30 (to allow sending these ids) and before rebuilding the the nodebalancer config it does API request to obtain ids of current nodes.

Here is a recorded POST request after this change. Node 192.168.242.85:31059 was added while the others were kept form the original config, so they have ids set.

POST  /v4/nodebalancers/xxxx/configs/xxxx/rebuild  HTTP/1.1
HOST   : censored
HEADERS:
	Accept: application/json
	Authorization: Bearer censored
	Content-Type: application/json
	User-Agent: linode-cloud-controller-manager linodego/v1.30.0 https://github.com/linode/linodego
BODY   :
{
   "port": 80,
   "protocol": "tcp",
   "proxy_protocol": "none",
   "check": "connection",
   "check_interval": 5,
   "check_attempts": 2,
   "check_passive": true,
   "check_timeout": 3,
   "nodes": [
      {
         "address": "192.168.191.200:31059",
         "label": "okunc-c5-okunc-c1-3",
         "weight": 100,
         "mode": "accept",
         "id": 96539997
      },
      {
         "address": "192.168.242.85:31059",
         "label": "okunc-c5-okunc-c1-4",
         "weight": 100,
         "mode": "accept"
      },
      {
         "address": "192.168.242.39:31059",
         "label": "okunc-c5-okunc-c1-1",
         "weight": 100,
         "mode": "accept",
         "id": 96540073
      },
      {
         "address": "192.168.242.45:31059",
         "label": "okunc-c5-okunc-c1-2",
         "weight": 100,
         "mode": "accept",
         "id": 96540074
      }
   ]
}

General:

Have you removed all sensitive information, including but not limited to access keys and passwords?
Have you checked to ensure there aren't other open or closed Pull Requests for the same bug/feature/question?

Pull Request Guidelines:

Does your submission pass tests?
Have you added tests?
Are you addressing a single feature in this PR?
Are your commits atomic, addressing one change per commit?
Are you following the conventions of the language?
Have you saved your large formatting changes for a different PR, so we can focus on your work?
Have you explained your rationale for why this feature is needed?
Have you linked your PR to an open issue

akaokunc · 2024-03-19T20:21:46Z

To succeed all the CI steps I had to drop the support for 1.20 which is unable to build it after bumping k8s deps and tidying the modules.

Not really sure what we should do when the node_controller or service_controller would not be able to register the informers, for now I'm just logging the error. See the implementation of the change which introduced these errors: kubernetes/client-go@ecdc8bf

… nodebalancer nodes to avoid rebuilds.

okokes-akamai

A few minor comments here or there.

One more important point from is that I'd ideally like to see this reflected in (unit) tests - what was the behaviour like before this change and what thing we're trying to achieve here. Just to avoid regressions.

cloud/linode/loadbalancers.go

cloud/linode/mock_client_test.go

e2e/go.mod

e2e/go.sum

go.mod

rahulait · 2024-03-28T14:37:57Z

Can we have the CI tests pass for this PR? Also, not sure if we have tested building the image and running it on a k8s cluster. I am seeing pods crashlooping with this branch and hence checking. Looks like complaining due to the change in main.go

      --vmodule pattern=N,...          comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)

error: unknown flag: --port

…NBNodes slice

…equests

akaokunc · 2024-04-03T08:43:08Z

Can we have the CI tests pass for this PR? Also, not sure if we have tested building the image and running it on a k8s cluster. I am seeing pods crashlooping with this branch and hence checking. Looks like complaining due to the change in main.go
      --vmodule pattern=N,...          comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)

error: unknown flag: --port

Is it still the case? I believe it was some glitch during rebasing, now the changes in main.go are those which are really needed to bump linodego up and subsequently k8s deps.

cloud/linode/loadbalancers.go

cloud/linode/loadbalancers_test.go

cloud/linode/loadbalancers.go

cloud/linode/loadbalancers_test.go

cloud/linode/node_controller.go

cloud/linode/service_controller.go

cloud/linode/loadbalancers.go

@avestuk

Thanks @avestuk for simplification. Co-authored-by: Alex Vest <avestuk@gmail.com>

Co-authored-by: Alex Vest <avestuk@gmail.com>

… copies which existed due to _test.go in name, that prevents the code to be imported

…tempt

akaokunc · 2024-04-08T20:31:43Z

E2E tests passed.

Ran 22 of 23 Specs in 467.720 seconds
SUCCESS! -- 22 Passed | 0 Failed | 0 Pending | 1 Skipped
PASS

Had to do a manifest change

<             - --leader-elect-resource-lock=endpoints
>             - --leader-elect-resource-lock=leases

rahulait · 2024-04-09T04:13:00Z

Can we also add the following diff to this PR so that helm installs work fine after the change?

diff --git a/deploy/chart/templates/daemonset.yaml b/deploy/chart/templates/daemonset.yaml
index 6a38e6e..6931591 100644
--- a/deploy/chart/templates/daemonset.yaml
+++ b/deploy/chart/templates/daemonset.yaml
@@ -29,9 +29,8 @@ spec:
           imagePullPolicy: {{ .Values.image.pullPolicy }}
           name: ccm-linode
           args:
-            - --leader-elect-resource-lock=endpoints
+            - --leader-elect-resource-lock=leases
             - --v=3
-            - --port=0
             - --secure-port=10253
             {{- if .Values.linodegoDebug }}
             - --linodego-debug={{ .Values.linodegoDebug }}

e2e/go.mod

rahulait · 2024-04-09T04:35:14Z

Also, once I switch to leases instead of endpoints, I see following error on all CCM pods:

E0409 04:33:57.324416       1 leaderelection.go:332] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:ccm-linode" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0409 04:34:01.398603       1 leaderelection.go:332] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:ccm-linode" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0409 04:34:05.004898       1 leaderelection.go:332] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:ccm-linode" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

One can test helm install working fine or not by following these steps.

Uninstall existing linode-ccm
Clone the linode-ccm repo with branch checked out
Run helm install to install ccm

export LINODE_API_TOKEN=<linodeapitoken>
export REGION=<linoderegion>
helm install ccm-linode --set apiToken=$LINODE_API_TOKEN,region=$REGION deploy/chart/

rahulait · 2024-04-09T12:56:57Z

Not sure why its not happening in the cluster you are testing. Can we add these changes as well, this seems to fix my issue:

diff --git a/deploy/chart/templates/clusterrole-rbac.yaml b/deploy/chart/templates/clusterrole-rbac.yaml
index 2953213..3d18d68 100644
--- a/deploy/chart/templates/clusterrole-rbac.yaml
+++ b/deploy/chart/templates/clusterrole-rbac.yaml
@@ -6,6 +6,9 @@ rules:
   - apiGroups: [""]
     resources: ["endpoints"]
     verbs: ["get", "watch", "list", "update", "create"]
+  - apiGroups: ["coordination.k8s.io"]
+    resources: ["leases"]
+    verbs: ["get", "watch", "list", "update", "create"]
   - apiGroups: [""]
     resources: ["nodes"]
     verbs: ["get", "watch", "list", "update", "delete", "patch"]

…leader election since the new k8s api introduced.

rahulait

LGTM

akaokunc requested review from luthermonson and okokes-akamai March 26, 2024 11:14

akaokunc added 5 commits March 26, 2024 16:04

Fixing the nodebalancer config rebuilds to include ids of preexisting…

d7865a5

… nodebalancer nodes to avoid rebuilds.

Bumping k8s deps and updating CCM to reflect new API

bb0ce4e

Fixing node_controller and service_controller due to changes in k8s api.

f7449de

upgrading toolchain

416767c

Bumping CI's go version

50ecce4

akaokunc force-pushed the okunc/fix_nodebalancer_config_rebuild_2nd_attempt branch from e5f931c to 50ecce4 Compare March 26, 2024 15:05

okokes-akamai reviewed Mar 27, 2024

View reviewed changes

akaokunc added 2 commits March 27, 2024 10:25

Updating go .mod

7d9df14

Adding node for nodebalancer test

c0bce4c

akaokunc added 4 commits April 2, 2024 22:26

Adding test for nodebalancer config update and fixing capacity of new…

566c108

…NBNodes slice

Fixing loop variable export on finding the request type within fake r…

7133b8c

…equests

Updating go.mod

d97f744

gofumpt + go.mod updates

c1a903c

akaokunc linked an issue Apr 3, 2024 that may be closed by this pull request

Nodebalancer rebuilds don't send node ID's which causes the API to reload nodebalancers configs when nodes have not been changed #197

Closed

4 tasks

Adding build deps/tools to separate file

6285aaa

okokes-akamai reviewed Apr 5, 2024

View reviewed changes

cloud/linode/loadbalancers.go Show resolved Hide resolved

cloud/linode/loadbalancers_test.go Outdated Show resolved Hide resolved

cloud/linode/loadbalancers_test.go Outdated Show resolved Hide resolved

akaokunc added 2 commits April 5, 2024 11:23

Bumping e2e tests go version

46f6365

Addressing review comments

1580640

avestuk reviewed Apr 5, 2024

View reviewed changes

akaokunc and others added 6 commits April 5, 2024 16:13

Update cloud/linode/loadbalancers.go

71b2ec2

Thanks @avestuk for simplification. Co-authored-by: Alex Vest <avestuk@gmail.com>

Update cloud/linode/loadbalancers_test.go

79b7fd6

Co-authored-by: Alex Vest <avestuk@gmail.com>

Refactored client mocks to generate just one file and dropped the two…

49c9fa2

… copies which existed due to _test.go in name, that prevents the code to be imported

Saving some cycles during tests by compiling the regex just once.

45aca4e

Contracting as per @avestuk's suggestion

f3420f0

Merge branch 'main' into okunc/fix_nodebalancer_config_rebuild_2nd_at…

e2e78f0

…tempt

rahulait reviewed Apr 9, 2024

View reviewed changes

e2e/go.mod Show resolved Hide resolved

Addressing review comments

6b53aec

akaokunc requested a review from okokes-akamai April 9, 2024 12:13

Adding cluster role to allow getting leases since these are used for …

6eb65e6

…leader election since the new k8s api introduced.

okokes-akamai approved these changes Apr 9, 2024

View reviewed changes

rahulait approved these changes Apr 9, 2024

View reviewed changes

avestuk approved these changes Apr 9, 2024

View reviewed changes

akaokunc merged commit 4899055 into linode:main Apr 9, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing the nodebalancer config rebuilds to include ids of preexisting nodebalancer nodes #192

Fixing the nodebalancer config rebuilds to include ids of preexisting nodebalancer nodes #192

akaokunc commented Mar 14, 2024 •

edited

akaokunc commented Mar 19, 2024

okokes-akamai left a comment

rahulait commented Mar 28, 2024

akaokunc commented Apr 3, 2024

akaokunc commented Apr 8, 2024 •

edited

rahulait commented Apr 9, 2024

rahulait commented Apr 9, 2024 •

edited

rahulait commented Apr 9, 2024

rahulait left a comment

Fixing the nodebalancer config rebuilds to include ids of preexisting nodebalancer nodes #192

Fixing the nodebalancer config rebuilds to include ids of preexisting nodebalancer nodes #192

Conversation

akaokunc commented Mar 14, 2024 • edited

General:

Pull Request Guidelines:

akaokunc commented Mar 19, 2024

okokes-akamai left a comment

Choose a reason for hiding this comment

rahulait commented Mar 28, 2024

akaokunc commented Apr 3, 2024

akaokunc commented Apr 8, 2024 • edited

rahulait commented Apr 9, 2024

rahulait commented Apr 9, 2024 • edited

rahulait commented Apr 9, 2024

rahulait left a comment

Choose a reason for hiding this comment

akaokunc commented Mar 14, 2024 •

edited

akaokunc commented Apr 8, 2024 •

edited

rahulait commented Apr 9, 2024 •

edited