Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot communicate between nodes when only loopback is up #228

Closed
clalancette opened this issue Sep 11, 2018 · 18 comments
Closed

Cannot communicate between nodes when only loopback is up #228

clalancette opened this issue Sep 11, 2018 · 18 comments
Labels
bug Something isn't working

Comments

@clalancette
Copy link
Contributor

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04, ARM64 (Rock64 board)
  • Installation type:
    • Source
  • Version or commit hash:
    • master
  • DDS implementation:
    • Fast-RTPS and Opensplice
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

  1. Compile latest ros2 from master: colcon build --event-handlers console_cohesion+
  2. Source the workspace: source install/setup.bash
  3. Take down any active network interfaces, so that only loopback is running.
  4. Run ros2 run demo_nodes_cpp talker in one terminal.
  5. Run ros2 run demo_nodes_cpp listener in another terminal

Expected behavior

The talker and listener can communicate over localhost.

Actual behavior

The talker and listener cannot communicate.

Additional information

When testing this with Opensplice, I saw an error message to the effect that there were no Multicast interfaces available (I can get the full message later). This leads me to expect that DDS/RTPS isn't doing any discovery because there is no multicast available. However, it still feels like communication in localhost should work in this scenario, hence the bug report.


Feature request

Feature description

Implementation considerations

@dirk-thomas
Copy link
Member

The interface must support multicast. That is simply a requirement. Have you tried to explicitly enable it (e.g. as described in https://answers.ros.org/question/300370/ros2-talker-cannot-communicate-with-listener/)?

Other vendors might offer alternatives (like shared memory) but I don't think that can be expected from every rmw implementation.

@Karsten1987
Copy link
Contributor

Given that this issue is reported multiple times, should we note this on the ros2 wiki on the tutorials page? Or is there a suitable place for it?

@dirk-thomas
Copy link
Member

A note that multicast needs to work for whatever network interface is used and a reference to ros2 multicast for trying it would be good. Probably also the snippet from the answer how to change the configuration if necessary.

@sloretz
Copy link
Contributor

sloretz commented Sep 11, 2018

duplicate of ros2/ros2#552 ?

@Karsten1987
Copy link
Contributor

I went ahead and added https://github.com/ros2/ros2/wiki/Tutorials#troubleshooting to the tutorials page. I am happy to move the section to another better suited place. Let me know.

@clalancette
Copy link
Contributor Author

I went ahead and added https://github.com/ros2/ros2/wiki/Tutorials#troubleshooting to the tutorials page. I am happy to move the section to another better suited place. Let me know.

@Karsten1987 Thanks for that. I edited that a bit, since the receive needs to be up for the test to work.

The interface must support multicast. That is simply a requirement. Have you tried to explicitly enable it (e.g. as described in https://answers.ros.org/question/300370/ros2-talker-cannot-communicate-with-listener/)?

I tried the suggestions in that post plus the suggestions in ros2/ros2#552, and I still cannot communicate with only localhost.

duplicate of ros2/ros2#552 ?

Yeah, looks like the same thing. We can close this one if we want to concentrate on the other one, but we still need to figure this use case out.

@dirk-thomas
Copy link
Member

I tried the suggestions in that post plus the suggestions in ros2/ros2#552, and I still cannot communicate with only localhost.

You might want to be a bit more specific what "still cannot communicate" means. Does the multicast test work after changing the settings?

@clalancette
Copy link
Contributor Author

You might want to be a bit more specific what "still cannot communicate" means. Does the multicast test work after changing the settings?

No, that test doesn't work. Both the ros2 multicast send and ros2 multicast receive commands throw exceptions:

$ ros2 multicast send
Sending one UDP multicast datagram...
Traceback (most recent call last):
  File "/home/rock64/ros2_ws/install/ros2cli/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.5.4', 'console_scripts', 'ros2')()
  File "/home/rock64/ros2_ws/install/ros2cli/lib/python3.6/site-packages/ros2cli/cli.py", line 69, in main
    rc = extension.main(parser=parser, args=args)
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/command/multicast.py", line 39, in main
    return extension.main(args=args)
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/verb/send.py", line 24, in main
    send(b'Hello World!')
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/api/__init__.py", line 25, in send
    s.sendto(data, (group, port))
OSError: [Errno 101] Network is unreachable
$ ros2 multicast receive
Waiting for UDP multicast datagram...
Traceback (most recent call last):
  File "/home/rock64/ros2_ws/install/ros2cli/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.5.4', 'console_scripts', 'ros2')()
  File "/home/rock64/ros2_ws/install/ros2cli/lib/python3.6/site-packages/ros2cli/cli.py", line 69, in main
    rc = extension.main(parser=parser, args=args)
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/command/multicast.py", line 39, in main
    return extension.main(args=args)
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/verb/receive.py", line 25, in main
    data, (host, port) = receive()
  File "/home/rock64/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/api/__init__.py", line 44, in receive
    s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)
OSError: [Errno 19] No such device

Additionally, I tried running ros2 run demo_nodes_cpp talker and ros2 run demo_nodes_cpp listener, and no data flows between them (the talker is continually publishing as expected, but the listener never receives anything).

@mjcarroll
Copy link
Member

I was receiving the same error, but found out that I had to enable multicast on lo because it's not on by default (from: https://stackoverflow.com/a/30982079)

route add -net 224.0.0.0 netmask 240.0.0.0 dev lo
ifconfig lo multicast

Example output:

➜  ros2_ws ros2 multicast receive       
Waiting for UDP multicast datagram...
Traceback (most recent call last):
  File "/home/michael/workspaces/ros2_ws/install/ros2cli/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.5.4', 'console_scripts', 'ros2')()
  File "/home/michael/workspaces/ros2_ws/install/ros2cli/lib/python3.6/site-packages/ros2cli/cli.py", line 69, in main
    rc = extension.main(parser=parser, args=args)
  File "/home/michael/workspaces/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/command/multicast.py", line 39, in main
    return extension.main(args=args)
  File "/home/michael/workspaces/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/verb/receive.py", line 25, in main
    data, (host, port) = receive()
  File "/home/michael/workspaces/ros2_ws/install/ros2multicast/lib/python3.6/site-packages/ros2multicast/api/__init__.py", line 44, in receive
    s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)
OSError: [Errno 19] No such device
➜  ros2_ws sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev lo
➜  ros2_ws sudo ifconfig lo multicast
➜  ros2_ws ros2 multicast receive                                
Waiting for UDP multicast datagram...
Received from 172.18.0.1:45392: 'Hello World!'

@clalancette
Copy link
Contributor Author

I was receiving the same error, but found out that I had to enable multicast on lo because it's not on by default (from: https://stackoverflow.com/a/30982079)

Hm, OK. I did try that, but I'll give it another try a little later.

@mjcarroll
Copy link
Member

It may be a false positive, that 172.18. net is a br interface for me, that I brought down, but may have been brought back up by something else

@mjcarroll
Copy link
Member

I just tried this again on my laptop, which doesn't have the bridge interface, and it seems to have worked there as well, it may be worth trying again @clalancette.

@clalancette
Copy link
Contributor Author

@mjcarroll and I did some more experimentation here. With all networking down except for lo, I get the error messages as specified in #228 (comment) . If I follow the instructions at https://stackoverflow.com/a/30982079 , then ros2 multicast send and ros2 multicast receive work properly and transmit a 'Hello World' between them. I then tried:

ros2 run demo_nodes_cpp talker
ros2 run demo_nodes_cpp listener

Both successfully start, and the talker continually publishes messages. But the listener never seems to receive them. More debugging is going to be needed here. @richiprosima Not sure if this is something you have seen/dealt with before in Fast-RTPS, but any thoughts you have may be valuable.

@clalancette
Copy link
Contributor Author

After some more experimentation I found out that if I setup loopback with https://stackoverflow.com/a/30982079 and use opensplice, things work fine. That seems to point to a problem in Fast-RTPS, but I'll have to investigate more.

@clalancette
Copy link
Contributor Author

Update: After some poking around, I found that Fast-RTPS wasn't always considering localhost as a valid device to send data on. I've opened up eProsima/Fast-DDS#267 which fixes the problem in my testing (along with the route and lo multicast changes).

@clalancette
Copy link
Contributor Author

This was fixed by eProsima/Fast-DDS#303, so closing.

@twaddellberkeley
Copy link

@clalancette I still have this issue, Im wondering if my Fast_DDS library is behind, is there any way to check/update?

@clalancette
Copy link
Contributor Author

This is a really old issue, so I don't know. I also don't know what setup you are using. I'll suggest opening a question over at https://answers.ros.org with lots of details about your environment and what you are trying to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants