Skip to content

[BUG] Flaky CI: test_resource_change_notifier timeout + test_hybrid_suppression SIGSEGV on Humble #344

@bburda

Description

@bburda

Bug report

Two flaky tests on main branch CI.

Test 1: test_resource_change_notifier - 60s timeout hang

Steps to reproduce:

  1. Run test_resource_change_notifier under CPU pressure (CI runners)
  2. Test MultipleSubscribersAllCalled hangs until CTest 60s timeout
  3. No result XML generated ("missing_result")

Expected behavior: All 16 tests pass in under 5 seconds.

Actual behavior: Binary hangs on MultipleSubscribersAllCalled, killed by CTest timeout.

Root cause: ResourceChangeNotifier notifier is declared first in every test, so it is destroyed last - after the std::promise/std::atomic variables that the worker thread's callbacks reference. When future.wait_for(2s) times out under CI load, the test returns and destroys the promise while the worker thread is still calling set_value() on it - causing undefined behavior (hang in corrupted promise internals). The notifier destructor then calls join() which blocks forever.

Observed on: Rolling (run 23909922091, 2026-04-02)

Test 2: test_hybrid_suppression - SIGSEGV on Humble

Steps to reproduce:

  1. Run test_hybrid_suppression integration test on Humble
  2. Demo nodes crash with SIGSEGV (exit code -11) during SIGINT shutdown
  3. test_exit_codes fails because -11 is not in ALLOWED_EXIT_CODES

Expected behavior: All demo nodes exit cleanly with 0, SIGINT, or SIGTERM.

Actual behavior: Random demo nodes (brake_actuator, brake_pressure_sensor) crash with SIGSEGV.

Root cause: Two combined issues: (A) BrakeActuator and LightController have subscriptions with this-capturing callbacks but destructors don't reset subscriptions before member destruction. (B) Humble-specific rclcpp::spin() teardown race - DDS callbacks fire on partially-destroyed nodes during SIGINT shutdown.

Observed on: Humble only (runs 23909922091, 23895290725, 23708608625)

Environment

  • ros2_medkit version: main (latest)
  • ROS 2 distro: Rolling (test 1), Humble (test 2)
  • OS: Ubuntu Noble / Jammy (GitHub Actions)

Fix plan

  • Test 1: Reorder declarations in all tests so notifier is declared after shared state (destroyed first)
  • Test 2: Fix demo node destructors (reset subscriptions/timers) + restructure main() to destroy node before rclcpp::shutdown()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions