New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NodeExcludeBalancers to exclude nodes as external loadbalancer #2073
Conversation
2bf91d5
to
8a81f14
Compare
@fedepaol I'm not sure this change is in the right place, Can you check it for me?Thanks |
8a81f14
to
5575f39
Compare
I'd follow what we do for the networknotavailablecondition: metallb/speaker/bgp_controller.go Line 169 in 1f0fdf3
metallb/speaker/layer2_controller.go Line 234 in 1f0fdf3
should be pretty easy. Also, please add a couple of e2e tests. |
Also also, please fix the commit message |
@cyclinder still interested in working on this (no rush, just checking)? |
@fedepaol hey,sorry for the delay. yes, but I am sucked in working these days. I'll working on this next weekend. |
no problem, I know the feeling :-) |
This PR has been automatically marked as stale because it has been open 30 days
|
This problem is relevant for me. |
I apologize for the delay. I will work on this in the next few days and address the CI failures. |
@cyclinder Thanks, I appreciate it 🤝 |
5575f39
to
022aa71
Compare
df458cc
to
7e15972
Compare
e601394
to
efe7335
Compare
Sorry for the delay again. Before adding the go 1.21.4
use (
.
./e2etest
) The previous CI failures were because when creating a cluster with kind, it labels the control-plane node with "node.kubernetes.io/exclude-from-external-load-balancers" by default., and this PR implements the value of this label, causing some specs to fail. I would like to manually remove this label after starting the Kind cluster (to avoid affecting existing e2e tests), and then separately add an e2e test specifically for testing this PR. This e2e will include the following steps:
To avoid impacting other e2e tests, this e2e must run serially. WDYT? |
internal/config/config.go
Outdated
@@ -1140,6 +1140,11 @@ func selectedNodes(nodes []corev1.Node, selectors []metav1.LabelSelector) (map[s | |||
res := make(map[string]bool) | |||
OUTER: | |||
for _, node := range nodes { | |||
_, ok := node.Labels[corev1.LabelNodeExcludeBalancers] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the wrong place to add the logic, as the function is "selected nodes".
I'd rather do the filtering in the controller (when we build the resources structure), or on top of config.For and carrying around the filtered node slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, you can even add the filter in the config controller's predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the latest changes? I prefer to do these filters in the speaker's shouldAnnoce, it's simply and we can easily add logs to see what happed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on my local machine, It works well:
➜ metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned
Normal nodeAssigned 24m (x6 over 45m) metallb-speaker announcing from node "kind-worker2" with protocol "layer2"
➜ metallb git:(exclude_nodes) ✗ kubectl label nodes kind-worker2 node.kubernetes.io/exclude-from-external-load-balancers=""
node/kind-worker2 labeled
➜ metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned
Normal nodeAssigned 24m (x6 over 45m) metallb-speaker announcing from node "kind-worker2" with protocol "layer2"
Normal nodeAssigned 3s (x2 over 38m) metallb-speaker announcing from node "kind-control-plane" with protocol "layer2"
➜ metallb git:(exclude_nodes) ✗ kubectl label nodes kind-worker2 node.kubernetes.io/exclude-from-external-load-balancers-
node/kind-worker2 unlabeled
➜ metallb git:(exclude_nodes) ✗ kubectl describe svc -n l2-6976 external-local-lb | grep nodeAssigned
Normal nodeAssigned 55s (x2 over 39m) metallb-speaker announcing from node "kind-control-plane" with protocol "layer2"
Normal nodeAssigned 4s (x7 over 46m) metallb-speaker announcing from node "kind-worker2" with protocol "layer2"
Weird, I did not have any issues, when opening directly the e2etests folder both with vim and vscode. We have a hardcoded go.work file there to fetch the metallb api.
I'd do it as part of inv dev-env
Not sure I understood. Here's what I'd do:
All the e2e tests run serially. |
I would also add a filter in the node controller |
eb17eef
to
1751aab
Compare
Jan 23 03:18:27.[335](https://github.com/metallb/metallb/actions/runs/7620493254/job/20755437194?pr=2073#step:6:336): INFO: Waiting up to 2m0s for 1 pods to be running and ready: [external-local-lb-g277x]
Jan 23 03:18:29.341: INFO: Wanted all 1 pods to be running and ready. Result: true. Pods: [external-local-lb-g277x]
STEP: getting the advertising node @ 01/23/24 03:18:29.343
STEP: add the NodeExcludeBalancers label of the node @ 01/23/24 03:18:29.345
STEP: event.Message = announcing from node "kind-worker2" with protocol "layer2", event.Time = 1705979909000000000
@ 01/23/24 03:18:30.353
STEP: event.Message = announcing from node "kind-worker" with protocol "layer2", event.Time = 1705979909000000000 From the debug logs, this looks strange, the lastTimeStep is the same for different events, which causes |
a6bc4ec
to
cbbf287
Compare
@fedepaol All the tests have passed, We need the 1s sleep, Otherwise, the lastTimeStep is the same for different events, which causes k8s.GetSvcNode to return an incorrect node. |
Sorry if I insist, but I am probably missing something. Which GetSvcNode are you referring to? This one (the one after the sleep) should work either way, as it's wrapped by eventually? And the previous one is before the sleep, so it won't be affected. Also, if we decide the sleep is mandatory, we should add a big comment to explain why. |
Please the #2073 (comment), The second |
Now I get it! So what you are actually delaying is adding the labels so the even has the right granularity. |
If the node has labeled "node.kubernetes.io/exclude-from-external-load-balancers", It specifies that the node should not be considered as a target for external load-balancers which use nodes as a second hop. Signed-off-by: cyclinder <kuocyclinder@gmail.com>
cbbf287
to
71fe989
Compare
LGTM! Sending to merge queue |
I don't think this is a bug really, but fyi - #2274. This will break any |
The label specifies that the node should not be considered as a target for external load-balancers which use nodes as a second hop.
Fixed #2021