-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Describe the bug
The metalnetlet NetworkInterface controller is triggering frequent unnecessary reconciliations for all NetworkInterfaces due to reconciliation loops and unnecessary triggers, causing unnecessary CPU/memory load and inefficient resource utilization on the controller.
Key Impact:
- Frequency:
NetworkInterfacereconciliations are triggered every few seconds/minutes due to multiple root causes. - Scale: Each trigger causes reconciliation of all
NetworkInterfacesassigned to that node. - Infinite Loop: Status updates create reconciliation loops.
- Performance: Unnecessary CPU/memory usage and controller load.
To Reproduce
- Deploy
metalnetletcontroller - Create a network and multiple
NetworkInterfaceresources - Monitor the
metalnetletcontroller logs - Observe continuous NIC reconciliation triggers
Root Causes
Root Cause 1: Infinite Reconciliation Loop
The v1alpha1.NetworkInterface controller is updating the status of the metalnet.NetworkInterface even when there are no actual changes to spec or status, which then triggers the metalnet.NetworkInterface watcher, which triggers another v1alpha1.NetworkInterface reconciliation (using EnqueueRequestForSource), creating an infinite loop.
Root Cause 2: Node Heartbeat Updates
The metalnetlet's NetworkInterface controller watches corev1.Node objects, and since it lacks any predicate, it watches for all node changes (metadata, spec, and status condition updates) and triggers reconciliation for all NetworkInterfaces associated with that node. Specifically, frequent updates in node.status.conditions[].lastHeartbeatTime are causing these unnecessary reconciliation triggers.
Expected behavior
NetworkInterface reconciliations should only be triggered when there are actual spec/status changes that affect NetworkInterface configuration.
NOT on unnecessary status updates that create infinite loops.
NOT on every node heartbeat update (which only changes status timestamps).
Questions
- Do we really need to watch
Nodeobject and trigger NIC reconciliations? Or should we remove theNodewatcher entirely sinceNodedoesn't have any NIC-related configuration? - What
Nodechanges should legitimately triggerNetworkInterfacereconciliation?
Probable Solutions
For Root Cause 1 (Infinite Loop):
Add predicates to filter out same status updates and only trigger on actual spec/status changes in the metalnet.NetworkInterface watcher
For Root Cause 2 (Node Heartbeat):
Add a custom predicate to filter out Node heartbeat changes.
Additional context
The following is a sample debug log output showing the frequent reconciliation triggers occurring every few minutes:
Infinite Loop Triggers:
2025-07-18T10:58:34Z DEBUG eventhandler.enqueueRequestForSource SourceAware watcher Update triggered {"sourceObject": {"name":"1a06fdb5-0ab8-49ee-85e2-5126845d215a","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162046", "newResourceVersion": "220062140", "enqueuedCount": 1}
2025-07-18T10:58:34Z DEBUG eventhandler.enqueueRequestForSource SourceAware watcher Update triggered {"sourceObject": {"name":"57a43606-d4e4-4ebf-a3fb-12d0b88a1b0d","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162048", "newResourceVersion": "220062141", "enqueuedCount": 1}
2025-07-18T10:58:35Z DEBUG eventhandler.enqueueRequestForSource SourceAware watcher Update triggered {"sourceObject": {"name":"a5f3aa39-cde1-470c-8830-12e4935ffd78","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162049", "newResourceVersion": "220062143", "enqueuedCount": 1}
2025-07-18T10:58:35Z DEBUG eventhandler.enqueueRequestForSource SourceAware watcher Update triggered {"sourceObject": {"name":"4fcb4ab4-ac3d-485f-8390-012de4008fbf","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162051", "newResourceVersion": "220062150", "enqueuedCount": 1}
Node Heartbeat Triggers:
2025-07-21T16:23:03Z DEBUG MetalnetNode watcher enqueuing network interfaces {"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:23:03Z DEBUG MetalnetNode watcher enqueuing network interfaces {"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:24:13Z DEBUG MetalnetNode watcher enqueuing network interfaces {"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:24:13Z DEBUG MetalnetNode watcher enqueuing network interfaces {"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
Metadata
Metadata
Assignees
Labels
Type
Projects
Status