New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle find_container_node_names error #322
Conversation
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
I think that's a fair rationale, so it LGTM to get back to green CI, but I do not agree with
IMHO the solution is to simply not raise generic exceptions such as |
It is good to know that this race is responsible for the failing test. I agree with the previous comment though that it should be made sure that a distinguishable error code is returned from the RMW API and that the case where the node simply doesn't exist needs to be signaled with a different exception type in If we don't address the root of the problem now but only use this hack to ignore all kinds of errors I doubt it will get revisited. If we immediately follow up with this improvement I am fine to merge this hack as a short term temporary workaround. |
I agree that's an easier solution, but the other solution I commented directly avoids the race condition, instead of handling the error after not finding the node name.
I agree, I will actually forget. |
Yes, you're right, but we're a bit further away from being able to pull that one off. |
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
* Handle find_container_node_names error (#322) Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com> * Catch generic RuntimeError instead Since rclpy.node.NodeNameNonExistentError is not defined in Dashing. Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Fix #321.
There is a raise condition, as I described here.
The error I'm catching comes from here:
https://github.com/ros2/rclpy/blob/8da91cee54b68f1108ca017da885496ffb5ae16c/rclpy/src/rclpy/_rclpy.c#L3137-L3146
The problem is that I can be also hiding some other errors by doing this.
rcl_get_service_names_and_types_by_node
is a wrapper ofrmw_get_service_names_and_types_by_node
.rmw_get_service_names_and_types_by_node
can fail not only because the node name wan't found, but also because allocation problems, etc (it depends on the rmw implementation); and I don't have a way to identify the error above fromrclpy
or above layers.Ideally, the best way of solving this would be to list the services of all nodes, using:
ros2cli/ros2service/ros2service/api/__init__.py
Lines 22 to 28 in 9f28544
And then recognize the node that was hosting the service from the fully qualified service name.
This won't be possible until we go ahead with ros2/design#241.
There is currently ambiguity:
/asd/bsd/_container/load_node
Can be a node with
ns=/asd
name=bsd
and the service name is_container/load_node
.Or it can be:
ns=/
name=asd
and service name isbsd/_container/load_name
(an unusual choice).Seeing the four places than
find_container_node_names
is used (1, 2, 3, 4), I prefer hiding some unusual errors (bad allocations, etc) to avoid the race condition (which is a frequent error). The only result is that a node container is not found and skipped (when any error occurs).