Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NodeExcludeBalancers to exclude nodes as external loadbalancer #2073

Merged
merged 1 commit into from Jan 24, 2024

Conversation

cyclinder
Copy link
Contributor

@cyclinder cyclinder commented Sep 11, 2023

The label specifies that the node should not be considered as a target for external load-balancers which use nodes as a second hop.

Fixed #2021

@cyclinder
Copy link
Contributor Author

@fedepaol I'm not sure this change is in the right place, Can you check it for me?Thanks

@fedepaol
Copy link
Member

@fedepaol I'm not sure this change is in the right place, Can you check it for me?Thanks

I'd follow what we do for the networknotavailablecondition:

if k8snodes.IsNetworkUnavailable(nodes[c.myNode]) {

if k8snodes.IsNetworkUnavailable(nodes[s]) {

should be pretty easy.

Also, please add a couple of e2e tests.

@fedepaol
Copy link
Member

Also also, please fix the commit message

@fedepaol
Copy link
Member

fedepaol commented Oct 27, 2023

@cyclinder still interested in working on this (no rush, just checking)?

@cyclinder
Copy link
Contributor Author

@fedepaol hey,sorry for the delay. yes, but I am sucked in working these days. I'll working on this next weekend.

@fedepaol
Copy link
Member

@fedepaol hey,sorry for the delay. yes, but I am sucked in working these days. I'll working on this next weekend.

no problem, I know the feeling :-)
Thanks again!

Copy link

This PR has been automatically marked as stale because it has been open 30 days
with no activity. This issue will be closed in 10 days unless you do one of the following:

  • respond or perform some activity on this PR
  • have one of these labels applied: hold

@onesb23
Copy link

onesb23 commented Nov 27, 2023

This problem is relevant for me.

@cyclinder
Copy link
Contributor Author

I apologize for the delay. I will work on this in the next few days and address the CI failures.

@onesb23
Copy link

onesb23 commented Nov 27, 2023

@cyclinder Thanks, I appreciate it 🤝

@cyclinder cyclinder changed the title feature: make the node's label "node.kubernetes.io/exclude-from-exter… make the label "node.kubernetes.io/exclude-from-external-load-balancers" valuable Dec 4, 2023
@cyclinder cyclinder force-pushed the exclude_nodes branch 2 times, most recently from df458cc to 7e15972 Compare December 4, 2023 05:38
@cyclinder cyclinder force-pushed the exclude_nodes branch 3 times, most recently from e601394 to efe7335 Compare December 12, 2023 05:58
@cyclinder
Copy link
Contributor Author

Sorry for the delay again.

Before adding the go.work at the root of the project, it was really hard to add code for e2etests, and it was challenging to run inv e2etest locally. I'm not sure if it's an issue with my local environment or something else. After adding the go.work file as follows, everything went smoothly.

go 1.21.4

use (
	.
	./e2etest
)

The previous CI failures were because when creating a cluster with kind, it labels the control-plane node with "node.kubernetes.io/exclude-from-external-load-balancers" by default., and this PR implements the value of this label, causing some specs to fail.

I would like to manually remove this label after starting the Kind cluster (to avoid affecting existing e2e tests), and then separately add an e2e test specifically for testing this PR. This e2e will include the following steps:

  • Create a service with ETP=Local and announce it to the control-plane node.
  • Label the control-plane node with label: "" and manually trigger a configuration update.
  • Verify if it is re-announced to other nodes.

To avoid impacting other e2e tests, this e2e must run serially. WDYT?

e2etest/pkg/frr/validation.go Outdated Show resolved Hide resolved
e2etest/pkg/k8s/nodes.go Outdated Show resolved Hide resolved
@@ -1140,6 +1140,11 @@ func selectedNodes(nodes []corev1.Node, selectors []metav1.LabelSelector) (map[s
res := make(map[string]bool)
OUTER:
for _, node := range nodes {
_, ok := node.Labels[corev1.LabelNodeExcludeBalancers]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the wrong place to add the logic, as the function is "selected nodes".

I'd rather do the filtering in the controller (when we build the resources structure), or on top of config.For and carrying around the filtered node slices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, you can even add the filter in the config controller's predicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the latest changes? I prefer to do these filters in the speaker's shouldAnnoce, it's simply and we can easily add logs to see what happed.

Copy link
Contributor Author

@cyclinder cyclinder Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on my local machine, It works well:

➜  metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned
  Normal  nodeAssigned  24m (x6 over 45m)  metallb-speaker  announcing from node "kind-worker2" with protocol "layer2"

➜  metallb git:(exclude_nodes) ✗ kubectl label nodes kind-worker2 node.kubernetes.io/exclude-from-external-load-balancers=""
node/kind-worker2 labeled
➜  metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned                      
  Normal  nodeAssigned  24m (x6 over 45m)  metallb-speaker  announcing from node "kind-worker2" with protocol "layer2"
  Normal  nodeAssigned  3s (x2 over 38m)   metallb-speaker  announcing from node "kind-control-plane" with protocol "layer2"

➜  metallb git:(exclude_nodes) ✗ kubectl label nodes kind-worker2 node.kubernetes.io/exclude-from-external-load-balancers-  
node/kind-worker2 unlabeled
➜  metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned                    
  Normal  nodeAssigned  55s (x2 over 39m)  metallb-speaker  announcing from node "kind-control-plane" with protocol "layer2"
  Normal  nodeAssigned  4s (x7 over 46m)   metallb-speaker  announcing from node "kind-worker2" with protocol "layer2"

@fedepaol
Copy link
Member

Sorry for the delay again.

Before adding the go.work at the root of the project, it was really hard to add code for e2etests, and it was challenging to run inv e2etest locally. I'm not sure if it's an issue with my local environment or something else. After adding the go.work file as follows, everything went smoothly.

Weird, I did not have any issues, when opening directly the e2etests folder both with vim and vscode. We have a hardcoded go.work file there to fetch the metallb api.

go 1.21.4

use (
	.
	./e2etest
)

The previous CI failures were because when creating a cluster with kind, it labels the control-plane node with "node.kubernetes.io/exclude-from-external-load-balancers" by default., and this PR implements the value of this label, causing some specs to fail.

I would like to manually remove this label after starting the Kind cluster (to avoid affecting existing e2e tests),

I'd do it as part of inv dev-env

and then separately add an e2e test specifically for testing this PR. This e2e will include the following steps:

  • Create a service with ETP=Local and announce it to the control-plane node.
  • Label the control-plane node with label: "" and manually trigger a configuration update.
  • Verify if it is re-announced to other nodes.

Not sure I understood. Here's what I'd do:

  • With L2 advertisement:
  • Announce the service and check which node is being used
  • Add hte label, the service should move
  • Remove the label, the service should get back to the original node

To avoid impacting other e2e tests, this e2e must run serially. WDYT?

All the e2e tests run serially.

@fedepaol
Copy link
Member

I would also add a filter in the node controller

@cyclinder cyclinder marked this pull request as draft December 12, 2023 10:51
@cyclinder cyclinder force-pushed the exclude_nodes branch 8 times, most recently from eb17eef to 1751aab Compare January 23, 2024 05:46
@cyclinder
Copy link
Contributor Author

Jan 23 03:18:27.[335](https://github.com/metallb/metallb/actions/runs/7620493254/job/20755437194?pr=2073#step:6:336): INFO: Waiting up to 2m0s for 1 pods to be running and ready: [external-local-lb-g277x]
Jan 23 03:18:29.341: INFO: Wanted all 1 pods to be running and ready. Result: true. Pods: [external-local-lb-g277x]
STEP: getting the advertising node @ 01/23/24 03:18:29.343
STEP: add the NodeExcludeBalancers label of the node @ 01/23/24 03:18:29.345
STEP: event.Message = announcing from node "kind-worker2" with protocol "layer2", event.Time = 1705979909000000000 
 @ 01/23/24 03:18:30.353
STEP: event.Message = announcing from node "kind-worker" with protocol "layer2", event.Time = 1705979909000000000

From the debug logs, this looks strange, the lastTimeStep is the same for different events, which causes k8s.GetSvcNode to return an incorrect node.

@cyclinder cyclinder force-pushed the exclude_nodes branch 7 times, most recently from a6bc4ec to cbbf287 Compare January 24, 2024 02:18
@cyclinder
Copy link
Contributor Author

@fedepaol All the tests have passed, We need the 1s sleep, Otherwise, the lastTimeStep is the same for different events, which causes k8s.GetSvcNode to return an incorrect node.

@fedepaol
Copy link
Member

@fedepaol All the tests have passed, We need the 1s sleep, Otherwise, the lastTimeStep is the same for different events, which causes k8s.GetSvcNode to return an incorrect node.

Sorry if I insist, but I am probably missing something. Which GetSvcNode are you referring to? This one (the one after the sleep) should work either way, as it's wrapped by eventually? And the previous one is before the sleep, so it won't be affected.

Also, if we decide the sleep is mandatory, we should add a big comment to explain why.

@cyclinder
Copy link
Contributor Author

Please the #2073 (comment), The second GetSvcNode actually gets two NodeAssigned events, but their times are the same, causing Eventually to always get the wrong node name. The reason for adding sleep before AddLabelToNode is to make sure that the second Event's time is later than the first Event's time. so we can get the correct node name.

@fedepaol
Copy link
Member

Please the #2073 (comment), The second GetSvcNode actually gets two NodeAssigned events, but their times are the same, causing Eventually to always get the wrong node name. The reason for adding sleep before AddLabelToNode is to make sure that the second Event's time is later than the first Event's time. so we can get the correct node name.

Now I get it! So what you are actually delaying is adding the labels so the even has the right granularity.
This will change when we'll (eventually) add the status crds (one day!).
Just add a comment so it's clear, please.

If the node has labeled "node.kubernetes.io/exclude-from-external-load-balancers", It specifies that the node should not be considered
 as a target for external load-balancers which use nodes as a second hop.

Signed-off-by: cyclinder <kuocyclinder@gmail.com>
@fedepaol
Copy link
Member

LGTM! Sending to merge queue

@fedepaol fedepaol added this pull request to the merge queue Jan 24, 2024
Merged via the queue into metallb:main with commit 1a8e52c Jan 24, 2024
36 checks passed
@howardjohn
Copy link

I don't think this is a bug really, but fyi - #2274. This will break any kind users (among others)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Honor node.kubernetes.io/exclude-from-external-load-balancers
4 participants