Bug report
Two related defects in ActionStatusBridgeNode that make its integration test flaky and mis-attribute faults.
1. Faults attributed to the DDS discovery placeholder node name
server_fqn_for_action() resolves the action server's node FQN from get_publishers_info_by_topic(<action>/_action/status) and only skips publishers whose node_name() is empty. During DDS discovery the participant is known before its node name/namespace propagate, so rcl returns the placeholders _NODE_NAME_UNKNOWN_ / _NODE_NAMESPACE_UNKNOWN_ (non-empty), and the bridge builds source_id = "_NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_".
reporter_for() then caches the FaultReporter built from that unresolved source_id for the node's lifetime, so faults stay permanently attributed to the placeholder even after discovery completes.
2. Missing subscription destructor
ActionStatusBridgeNode holds status subscriptions and a rescan timer whose callbacks capture this, but declares no destructor. On teardown those callbacks can fire on a partially destroyed node and crash (SIGABRT).
Steps to reproduce
- Start an action server and the
action_status_bridge.
- Trigger an action result the bridge maps to a fault (e.g. ABORTED) shortly after startup, while DDS discovery is still settling.
- Observe the fault's
reporting_sources (placeholder instead of the server FQN), and intermittent crashes on shutdown.
Expected behavior
reporting_sources contains the action server node FQN (e.g. /test_action_status_client).
- Clean shutdown.
Actual behavior
reporting_sources contains _NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_ and never corrects.
- Intermittent SIGABRT on teardown.
Both surface as flaky failures of test_integration.
Environment
- ros2_medkit version: current main
- ROS 2 distro: Jazzy / Humble / Lyrical
- OS: Ubuntu 24.04
Additional information
Fix: treat the placeholder as unresolved in server_fqn_from_endpoint() and do not permanently cache a reporter built from an unresolved FQN (re-resolve until the real node name appears); add a destructor that resets the rescan timer and subscriptions before the node is destroyed.
Bug report
Two related defects in
ActionStatusBridgeNodethat make its integration test flaky and mis-attribute faults.1. Faults attributed to the DDS discovery placeholder node name
server_fqn_for_action()resolves the action server's node FQN fromget_publishers_info_by_topic(<action>/_action/status)and only skips publishers whosenode_name()is empty. During DDS discovery the participant is known before its node name/namespace propagate, so rcl returns the placeholders_NODE_NAME_UNKNOWN_/_NODE_NAMESPACE_UNKNOWN_(non-empty), and the bridge buildssource_id = "_NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_".reporter_for()then caches theFaultReporterbuilt from that unresolvedsource_idfor the node's lifetime, so faults stay permanently attributed to the placeholder even after discovery completes.2. Missing subscription destructor
ActionStatusBridgeNodeholds status subscriptions and a rescan timer whose callbacks capturethis, but declares no destructor. On teardown those callbacks can fire on a partially destroyed node and crash (SIGABRT).Steps to reproduce
action_status_bridge.reporting_sources(placeholder instead of the server FQN), and intermittent crashes on shutdown.Expected behavior
reporting_sourcescontains the action server node FQN (e.g./test_action_status_client).Actual behavior
reporting_sourcescontains_NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_and never corrects.Both surface as flaky failures of
test_integration.Environment
Additional information
Fix: treat the placeholder as unresolved in
server_fqn_from_endpoint()and do not permanently cache a reporter built from an unresolved FQN (re-resolve until the real node name appears); add a destructor that resets the rescan timer and subscriptions before the node is destroyed.