Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclone DDS hangs in Galactic within a VM with all defaults using ros2cli #383

Closed
AlexisTM opened this issue Mar 30, 2022 · 4 comments
Closed

Comments

@AlexisTM
Copy link

AlexisTM commented Mar 30, 2022

Bug report

Required Info:

Steps to reproduce issue

- Build ROS2 from source
- ros2 run examples_rclcpp_minimal_composition composition_composed # The node doesn't matter
- ros2 node list # in another terminal

Expected behavior

The program hangs and never exits (even with CTRL+C).

Actual behavior

ros2 node list returns the list of the nodes

Additional information

Running RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 node list has the correct behaviour

Starting the daemon with debug prevents hanging, but doesn't provide with the expected output (below, the /topic topic exists and is published in the composed node precedently started and running fine).

user@apertis:~/ros2/bosch$ ros2 daemon stop 
The daemon has been stopped

user@apertis:~/ros2/bosch$ ros2 daemon start --debug
Interface kind: 2, info: [('10.0.2.2', 'enp0s3', True)]
Addresses by interfaces: {2: {'enp0s3': '10.0.2.15'}}
Serving XML-RPC on localhost:11511/ros2cli/
The daemon has been started

user@apertis:~/ros2/bosch$ ros2 topic echo /topic
get_topic_names_and_types()
Interface kind: 2, info: [('10.0.2.2', 'enp0s3', True)]
Addresses by interfaces: {2: {'enp0s3': '10.0.2.15'}}
get_name()
Interface kind: 2, info: [('10.0.2.2', 'enp0s3', True)]
Addresses by interfaces: {2: {'enp0s3': '10.0.2.15'}}
get_namespace()
Interface kind: 2, info: [('10.0.2.2', 'enp0s3', True)]
Addresses by interfaces: {2: {'enp0s3': '10.0.2.15'}}
WARNING: topic [/topic] does not appear to be published yet
Could not determine the type for the passed topic

user@apertis:~/ros2/bosch$ ros2 node list
get_node_names_and_namespaces()
Interface kind: 2, info: [('10.0.2.2', 'enp0s3', True)]
Addresses by interfaces: {2: {'enp0s3': '10.0.2.15'}}
@AlexisTM AlexisTM changed the title Cyclone DDS hangs in Galactic within a VM with all defaults Cyclone DDS hangs in Galactic within a VM with all defaults using ros2cli Mar 30, 2022
@eboasson
Copy link
Collaborator

I am afraid I don't (won't) have an Apertis 2022 SDK at hand. It certainly sounds like that is exhibiting some behaviour that interacts badly with Cyclone. Unusual, but not impossible.

What I usually suggest is to gather some tracing information from Cyclone DDS because that usually gives some insights in what is going on when everything works but there is no communication. Enabling it is as simple as putting

CYCLONEDDS_URI="<Tr><V>finest</><Out>cdds.log.\${CYCLONEDDS_PID}</>"

in the environment. That log starts with the configuration options, then you get some information on network selection and so on.

It is not so likely to give a clue as to why it hangs. For that, generally the best thing to do is to attach gdb and get stack traces for all threads (thread apply all bt) and look for something suspicious.

@AlexisTM
Copy link
Author

For the following logs, no other node was running.

cdds.log.1189.txt

cdds.log.1190.txt

@eboasson
Copy link
Collaborator

Looks to me like you have a firewall blocking all multicast traffic. This causes two problems:

  • Discovery no longer works (several other DDS implementations always cater for unicast discovery, but Cyclone by default doesn't if the interface supports multicast);
  • Some threads are dedicated to receiving data from a single socket, and on shutdown those need to unblocked. The best mechanism I have been able to find for that is sending a packet to the socket, but that does require the IP stack to cooperate.

I suspect the hanging is caused by ROS 2's signal handler being "too nice" if the impossible happens and stopping cleanly turns out to be impossible.

I'd suggest allowing multicast, but you can also disable multicast altogether (it is just that you lose out on a lot of niceties and add a significant amount of overhead):

<General>
  <AllowMulticast>false</AllowMulticast>
</General>
<Discovery>
  <ParticipantIndex>auto</ParticipantIndex>
  <Peers>
    <Peer address="localhost"/>
  </Peers>
</Discovery>

The hanging you could fix by not having threads dedicated to receiving data on a specific socket (Internal/MultipleReceiveThreads = false), but that alone won't get make discovery work. And both options above that should make discovery work should also solve the hanging.

@AlexisTM
Copy link
Author

That seems to be the problem indeed. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants