Skip to content

[BUG] action_status_bridge: faults attributed to discovery placeholder node name + missing subscription destructor #465

Description

@bburda

Bug report

Two related defects in ActionStatusBridgeNode that make its integration test flaky and mis-attribute faults.

1. Faults attributed to the DDS discovery placeholder node name

server_fqn_for_action() resolves the action server's node FQN from get_publishers_info_by_topic(<action>/_action/status) and only skips publishers whose node_name() is empty. During DDS discovery the participant is known before its node name/namespace propagate, so rcl returns the placeholders _NODE_NAME_UNKNOWN_ / _NODE_NAMESPACE_UNKNOWN_ (non-empty), and the bridge builds source_id = "_NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_".

reporter_for() then caches the FaultReporter built from that unresolved source_id for the node's lifetime, so faults stay permanently attributed to the placeholder even after discovery completes.

2. Missing subscription destructor

ActionStatusBridgeNode holds status subscriptions and a rescan timer whose callbacks capture this, but declares no destructor. On teardown those callbacks can fire on a partially destroyed node and crash (SIGABRT).

Steps to reproduce

  1. Start an action server and the action_status_bridge.
  2. Trigger an action result the bridge maps to a fault (e.g. ABORTED) shortly after startup, while DDS discovery is still settling.
  3. Observe the fault's reporting_sources (placeholder instead of the server FQN), and intermittent crashes on shutdown.

Expected behavior

  • reporting_sources contains the action server node FQN (e.g. /test_action_status_client).
  • Clean shutdown.

Actual behavior

  • reporting_sources contains _NODE_NAMESPACE_UNKNOWN_/_NODE_NAME_UNKNOWN_ and never corrects.
  • Intermittent SIGABRT on teardown.

Both surface as flaky failures of test_integration.

Environment

  • ros2_medkit version: current main
  • ROS 2 distro: Jazzy / Humble / Lyrical
  • OS: Ubuntu 24.04

Additional information

Fix: treat the placeholder as unresolved in server_fqn_from_endpoint() and do not permanently cache a reporter built from an unresolved FQN (re-resolve until the real node name appears); add a destructor that resets the rescan timer and subscriptions before the node is destroyed.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions