Skip to content

Gateway deadlocks when querying configurations on nodes without parameter service #318

@mfaferek93

Description

@mfaferek93

Bug report

Steps to reproduce

  1. Start gateway in runtime_only mode on a system with micro-ROS nodes (e.g., Yahboom ROSMASTER M3 Pro)
  2. Gateway discovers nodes like YB_Node (micro-ROS bridge) and autostart_node which don't have ROS 2 parameter service
  3. Send GET request to /api/v1/apps/YB_Node/configurations
  4. Request blocks for 4-6 seconds waiting for parameter service response
  5. During that time, ALL other HTTP requests queue behind it (httplib is single-threaded)
  6. If gateway queries its own node's configurations (/apps/ros2_medkit_gateway/configurations), it permanently deadlocks - it can't respond to its own parameter service call while the HTTP thread is blocked

Expected behavior

Gateway should return quickly (< 0.5s) with empty configurations or 503 for nodes without parameter service. It should never deadlock.

Actual behavior

  • GET /apps/YB_Node/configurations blocks for 4-6 seconds, returns 503
  • GET /apps/ros2_medkit_gateway/configurations causes permanent deadlock - gateway process alive but all HTTP endpoints unresponsive
  • GET /components/root/configurations also causes permanent deadlock
  • Once deadlocked, only kill -9 recovers the gateway
  • Cascading effect: any configurations request during the 4-6s block causes subsequent requests to timeout, amplifying into full gateway freeze

Environment

  • ros2_medkit version: main branch (commit ~00b243e)
  • ROS 2 distro: Humble
  • OS: Ubuntu 22.04 on Jetson Orin Nano (aarch64), L4T R36.4.3
  • Robot: Yahboom ROSMASTER M3 Pro with micro-ROS nodes

Additional information

This is the root cause of all gateway stability issues observed during M3 Pro testing. Web UI triggers this by loading entity detail which fetches configurations.

parameter_service_timeout_sec parameter exists (default 2.0s) but reducing it to 0.5s did not help - the blocking still occurs.

Possible fixes:

  • Make parameter service calls non-blocking (async with callback, don't block httplib thread)
  • Cache negative results - if a node doesn't respond to parameter service once, don't try again for N seconds
  • Detect self-query deadlock - never query gateway's own node configurations via parameter service
  • Use a thread pool for parameter service calls so they don't block the HTTP server thread

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions