-
Notifications
You must be signed in to change notification settings - Fork 23
Gateway deadlocks when querying configurations on nodes without parameter service #318
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug report
Steps to reproduce
- Start gateway in runtime_only mode on a system with micro-ROS nodes (e.g., Yahboom ROSMASTER M3 Pro)
- Gateway discovers nodes like
YB_Node(micro-ROS bridge) andautostart_nodewhich don't have ROS 2 parameter service - Send GET request to
/api/v1/apps/YB_Node/configurations - Request blocks for 4-6 seconds waiting for parameter service response
- During that time, ALL other HTTP requests queue behind it (httplib is single-threaded)
- If gateway queries its own node's configurations (
/apps/ros2_medkit_gateway/configurations), it permanently deadlocks - it can't respond to its own parameter service call while the HTTP thread is blocked
Expected behavior
Gateway should return quickly (< 0.5s) with empty configurations or 503 for nodes without parameter service. It should never deadlock.
Actual behavior
GET /apps/YB_Node/configurationsblocks for 4-6 seconds, returns 503GET /apps/ros2_medkit_gateway/configurationscauses permanent deadlock - gateway process alive but all HTTP endpoints unresponsiveGET /components/root/configurationsalso causes permanent deadlock- Once deadlocked, only
kill -9recovers the gateway - Cascading effect: any configurations request during the 4-6s block causes subsequent requests to timeout, amplifying into full gateway freeze
Environment
- ros2_medkit version: main branch (commit ~00b243e)
- ROS 2 distro: Humble
- OS: Ubuntu 22.04 on Jetson Orin Nano (aarch64), L4T R36.4.3
- Robot: Yahboom ROSMASTER M3 Pro with micro-ROS nodes
Additional information
This is the root cause of all gateway stability issues observed during M3 Pro testing. Web UI triggers this by loading entity detail which fetches configurations.
parameter_service_timeout_sec parameter exists (default 2.0s) but reducing it to 0.5s did not help - the blocking still occurs.
Possible fixes:
- Make parameter service calls non-blocking (async with callback, don't block httplib thread)
- Cache negative results - if a node doesn't respond to parameter service once, don't try again for N seconds
- Detect self-query deadlock - never query gateway's own node configurations via parameter service
- Use a thread pool for parameter service calls so they don't block the HTTP server thread
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working