Skip to content

Conversation

@jajeffries
Copy link
Contributor

@jajeffries jajeffries commented Oct 13, 2025

This pull request introduces a significant refactor to how backend state is managed and monitored in the agent. The main improvement is the introduction of a centralized StateManager for backend state, which enables more robust monitoring, error handling, and reporting, including integration with the Fleet heartbeat. The changes also propagate backend state information into heartbeats sent to Fleet, improving observability. Several interfaces and constructors are updated to support this new state management approach.

Key changes include:

Backend State Management Refactor:

  • Introduced backend.StateManager in agent/backend/backend_state.go, which centralizes backend state tracking, periodic monitoring, error registration, and restart logic. The manager uses a dedicated goroutine to monitor each backend and triggers restarts if a backend is unhealthy and the minimum restart interval has elapsed. ([agent/backend/backend_state.goR1-R111](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-6301055a8fa5f25637f457f26fa4d9556ff42cfe18d83199b773f78de9332819R1-R111))
  • Replaced the direct use of backendState map in agent/agent.go with the new StateManager, updating all relevant methods to interact with the manager for backend state, error, and restart registration. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL35-R45), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bR63-R78), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL81), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL122-R157), [[5]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL201-R212), [[6]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL222-R231))

Fleet Integration and Heartbeats:

  • Extended heartbeats sent to Fleet to include detailed backend state, by passing the StateManager (as a StateRetriever interface) down through the config manager and MQTT connection layers, and updating the heartbeat payload to include backend status, errors, and restart information. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-b0dddd7590e09bc17b468db04b48a21f0d508a17eca4c086eb98c87325e5ba98R23-R33), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-1783db06b73b72c164682aec902cffc352ef29c1eee8cd92929e7637e72f8047L28-R32), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R9), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R21-R29), [[5]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R43-R45), [[6]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R61-R76))

Testing and Mocking Updates:

  • Updated tests and mocks to accommodate the new backend state interface, including providing mock implementations and updating constructors in test files. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-e8832cdc90dafca1118e69654b2c59f5c6c3e40fbee158eaafba88b3d7243232R57-R73), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-e8832cdc90dafca1118e69654b2c59f5c6c3e40fbee158eaafba88b3d7243232L79-R90), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72R17), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72R38-R48), [[5]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72L57-R70))

These changes lay the groundwork for more reliable backend lifecycle management and improved system observability.

Backend State Management:

  • Added backend.StateManager to centralize backend state tracking, monitoring, and restart logic, replacing per-backend state maps. ([agent/backend/backend_state.goR1-R111](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-6301055a8fa5f25637f457f26fa4d9556ff42cfe18d83199b773f78de9332819R1-R111))
  • Updated agent/agent.go to use StateManager for backend lifecycle events, error registration, and restart handling. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL35-R45), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bR63-R78), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL81), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL122-R157), [[5]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL201-R212), [[6]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-62b6ad581fe3a3059ae8c85ef0f31dde4092bfdecfa7d6857c470bcacaa8cc8bL222-R231))

Fleet Heartbeat and Config Integration:

  • Passed StateManager through config and MQTT layers to allow heartbeats to report backend state to Fleet. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-b0dddd7590e09bc17b468db04b48a21f0d508a17eca4c086eb98c87325e5ba98R23-R33), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-1783db06b73b72c164682aec902cffc352ef29c1eee8cd92929e7637e72f8047L28-R32))
  • Extended heartbeat payloads to include backend state information, improving Fleet-side observability. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R9), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R21-R29), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R43-R45), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-796a5a43e9dd714c20b0d1863be358a6f2b4f3d5035e9ab9c4303f61fb2f24f2R61-R76))

Testing Improvements:

  • Updated tests and mocks to work with the new backend state interface, ensuring test coverage for the new architecture. ([[1]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-e8832cdc90dafca1118e69654b2c59f5c6c3e40fbee158eaafba88b3d7243232R57-R73), [[2]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-e8832cdc90dafca1118e69654b2c59f5c6c3e40fbee158eaafba88b3d7243232L79-R90), [[3]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72R17), [[4]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72R38-R48), [[5]](https://github.com/netboxlabs/orb-agent/pull/201/files#diff-61bcdb2570d98db360b12b22aede9c54e0c453085acd5e60d5902c8f92380e72L57-R70))This pull request introduces a new BackendStateManager to centralize and improve backend state tracking, error handling, and automated restart logic. It refactors how backend state is managed, replacing direct state maps with a dedicated manager, and enhances the reporting of backend status in fleet heartbeats. The changes also ensure that backend state is accessible throughout various components, including configuration management and fleet communication.

Backend state management and monitoring:

  • Introduced BackendStateManager in agent/backend/backend_state.go to handle backend state, monitor backend health, trigger restarts when necessary, and provide a unified interface for accessing backend state. This includes automated backend monitoring, error registration, and restart logic.
  • Refactored orbAgent in agent/agent.go to use BackendStateManager instead of a local backendState map, and to start backend monitors and handle restart requests through a channel. [1] [2] [3] [4] [5] [6] [7]

Integration with configuration management and fleet communication:

  • Updated configmgr and fleet-related components to accept and use the new BackendStateManager for backend state, including changes to constructors and dependency injection in agent/configmgr/fleet.go and agent/configmgr/fleet/connection.go. [1] [2]
  • Modified the heartbeater in agent/configmgr/fleet/heartbeats.go to include backend state in heartbeat messages, providing richer status information to fleet. [1] [2] [3] [4]

Testing and mocks:

  • Updated and extended tests and mocks to support the new backend state interface and manager, ensuring continued test coverage and correct initialization in agent/configmgr/fleet/connection_test.go and agent/configmgr/fleet/heartbeats_test.go. [1] [2] [3] [4] [5]

@github-actions
Copy link

github-actions bot commented Oct 13, 2025

Go test coverage

STATUS ELAPSED PACKAGE COVER PASS FAIL SKIP
🟢 PASS 0.01s github.com/netboxlabs/orb-agent/agent 0.0% 0 0 0
🟢 PASS 1.11s github.com/netboxlabs/orb-agent/agent/backend 33.3% 30 0 0
🟢 PASS 4.03s github.com/netboxlabs/orb-agent/agent/backend/devicediscovery 79.4% 2 0 0
🟢 PASS 0.01s github.com/netboxlabs/orb-agent/agent/backend/mocks 0.0% 0 0 0
🟢 PASS 4.03s github.com/netboxlabs/orb-agent/agent/backend/networkdiscovery 80.6% 2 0 0
🟢 PASS 4.02s github.com/netboxlabs/orb-agent/agent/backend/opentelemetryinfinity 74.1% 2 0 0
🟢 PASS 4.03s github.com/netboxlabs/orb-agent/agent/backend/pktvisor 71.7% 2 0 0
🟢 PASS 4.03s github.com/netboxlabs/orb-agent/agent/backend/snmpdiscovery 80.2% 2 0 0
🟢 PASS 5.03s github.com/netboxlabs/orb-agent/agent/backend/worker 80.6% 3 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/config 100.0% 6 0 0
🟢 PASS 31.18s github.com/netboxlabs/orb-agent/agent/configmgr 49.8% 14 0 0
🟢 PASS 4.53s github.com/netboxlabs/orb-agent/agent/configmgr/fleet 69.6% 92 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/policies 100.0% 15 0 0
🟢 PASS 1.03s github.com/netboxlabs/orb-agent/agent/policymgr 70.3% 10 0 0
🟢 PASS 24.18s github.com/netboxlabs/orb-agent/agent/secretsmgr 48.9% 54 0 0
🟢 PASS 1.02s github.com/netboxlabs/orb-agent/agent/version 100.0% 1 0 0

Total coverage: 62.8%

@jajeffries jajeffries marked this pull request as ready for review October 13, 2025 15:24
Copy link
Contributor

@leoparente leoparente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@jajeffries jajeffries merged commit c0c00f0 into develop Oct 14, 2025
5 checks passed
@jajeffries jajeffries deleted the feat/OBS-1489-heartbeats branch October 14, 2025 09:45
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

🎉 This PR is included in version 2.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

🎉 This PR is included in version 2.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants