Skip to content

Unnecessary NetworkInterface reconciliations triggered by Node heartbeat updates #423

@sujeet01

Description

@sujeet01

Describe the bug
The metalnetlet NetworkInterface controller is triggering frequent unnecessary reconciliations for all NetworkInterfaces due to reconciliation loops and unnecessary triggers, causing unnecessary CPU/memory load and inefficient resource utilization on the controller.

Key Impact:

  • Frequency: NetworkInterface reconciliations are triggered every few seconds/minutes due to multiple root causes.
  • Scale: Each trigger causes reconciliation of all NetworkInterfaces assigned to that node.
  • Infinite Loop: Status updates create reconciliation loops.
  • Performance: Unnecessary CPU/memory usage and controller load.

To Reproduce

  1. Deploy metalnetlet controller
  2. Create a network and multiple NetworkInterface resources
  3. Monitor the metalnetlet controller logs
  4. Observe continuous NIC reconciliation triggers

Root Causes

Root Cause 1: Infinite Reconciliation Loop
The v1alpha1.NetworkInterface controller is updating the status of the metalnet.NetworkInterface even when there are no actual changes to spec or status, which then triggers the metalnet.NetworkInterface watcher, which triggers another v1alpha1.NetworkInterface reconciliation (using EnqueueRequestForSource), creating an infinite loop.

Root Cause 2: Node Heartbeat Updates
The metalnetlet's NetworkInterface controller watches corev1.Node objects, and since it lacks any predicate, it watches for all node changes (metadata, spec, and status condition updates) and triggers reconciliation for all NetworkInterfaces associated with that node. Specifically, frequent updates in node.status.conditions[].lastHeartbeatTime are causing these unnecessary reconciliation triggers.

Expected behavior
NetworkInterface reconciliations should only be triggered when there are actual spec/status changes that affect NetworkInterface configuration.
NOT on unnecessary status updates that create infinite loops.
NOT on every node heartbeat update (which only changes status timestamps).

Questions

  • Do we really need to watch Node object and trigger NIC reconciliations? Or should we remove the Node watcher entirely since Node doesn't have any NIC-related configuration?
  • What Node changes should legitimately trigger NetworkInterface reconciliation?

Probable Solutions

For Root Cause 1 (Infinite Loop):
Add predicates to filter out same status updates and only trigger on actual spec/status changes in the metalnet.NetworkInterface watcher

For Root Cause 2 (Node Heartbeat):
Add a custom predicate to filter out Node heartbeat changes.

Additional context
The following is a sample debug log output showing the frequent reconciliation triggers occurring every few minutes:

Infinite Loop Triggers:

2025-07-18T10:58:34Z	DEBUG	eventhandler.enqueueRequestForSource	SourceAware watcher Update triggered	{"sourceObject": {"name":"1a06fdb5-0ab8-49ee-85e2-5126845d215a","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162046", "newResourceVersion": "220062140", "enqueuedCount": 1}
2025-07-18T10:58:34Z	DEBUG	eventhandler.enqueueRequestForSource	SourceAware watcher Update triggered	{"sourceObject": {"name":"57a43606-d4e4-4ebf-a3fb-12d0b88a1b0d","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162048", "newResourceVersion": "220062141", "enqueuedCount": 1}
2025-07-18T10:58:35Z	DEBUG	eventhandler.enqueueRequestForSource	SourceAware watcher Update triggered	{"sourceObject": {"name":"a5f3aa39-cde1-470c-8830-12e4935ffd78","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162049", "newResourceVersion": "220062143", "enqueuedCount": 1}
2025-07-18T10:58:35Z	DEBUG	eventhandler.enqueueRequestForSource	SourceAware watcher Update triggered	{"sourceObject": {"name":"4fcb4ab4-ac3d-485f-8390-012de4008fbf","namespace":"metalnet-system"}, "sourceGVK": "core.apinet.ironcore.dev/v1alpha1, Kind=NetworkInterface", "oldResourceVersion": "2200162051", "newResourceVersion": "220062150", "enqueuedCount": 1}

Node Heartbeat Triggers:

2025-07-21T16:23:03Z	DEBUG	MetalnetNode watcher enqueuing network interfaces	{"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:23:03Z	DEBUG	MetalnetNode watcher enqueuing network interfaces	{"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:24:13Z	DEBUG	MetalnetNode watcher enqueuing network interfaces	{"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}
2025-07-21T16:24:13Z	DEBUG	MetalnetNode watcher enqueuing network interfaces	{"metalnetNode": {"name":"worker-n1"}, "nodeName": "node1.worker-n1", "enqueuedCount": 57}

Metadata

Metadata

Assignees

Labels

area/networkingNetworking-related tasks and improvements.bugSomething isn't working

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions