-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
In issues #1562 and #1606 , there were frequent ConfigReconciler calls caused by frequent Node changes. That was fixed by #1607 by adding a filters inside the ConfigReconciler that ignored some Node changes. On the bright side, I can confirm that fix worked for ConfigReconciler!
But I think we need to apply a similar filter to the NodeReconciler:
metallb/internal/k8s/controllers/node_controller.go
Lines 67 to 76 in a826be5
| func (r *NodeReconciler) SetupWithManager(mgr ctrl.Manager) error { | |
| p := predicate.NewPredicateFuncs( | |
| func(obj client.Object) bool { | |
| node, ok := obj.(*v1.Node) | |
| if !ok { | |
| level.Error(r.Logger).Log("controller", "NodeReconciler", "error", "object is not node", "name", obj.GetName()) | |
| return false | |
| } | |
| return node.Name == r.NodeName | |
| }) |
so that this doesn't flood the logs:
{"caller":"node_controller.go:42","controller":"NodeReconciler","level":"info","start reconcile":"/n4-mac","ts":"2022-10-12T14:59:50Z"}
{"caller":"speakerlist.go:271","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-10-12T14:59:50Z"}
{"caller":"node_controller.go:64","controller":"NodeReconciler","end reconcile":"/n4-mac","level":"info","ts":"2022-10-12T14:59:50Z"}
{"caller":"node_controller.go:42","controller":"NodeReconciler","level":"info","start reconcile":"/n4-mac","ts":"2022-10-12T15:00:00Z"}
{"caller":"speakerlist.go:271","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-10-12T15:00:00Z"}
{"caller":"node_controller.go:64","controller":"NodeReconciler","end reconcile":"/n4-mac","level":"info","ts":"2022-10-12T15:00:00Z"}
I don't have hard proof, but I have seen very rare cases where metallb mistakenly triggers a failover of the VIP due to one of the node change events, even though none of the nodes appeared to have had any issues.
I think what we should learn from #1606 is that ALL node watching code needs some kind of filter, because nodes change very frequently under normal conditions (maybe the conditions or the images sections?).
What kind of node update events does the NodeReconciler really care about? Nodes being added or deleted? Or also nodes being NotReady/Cordoned? I think we should put those checks inside the filter here to save CPU, reduce log spam, and avoid risks of errant failovers.