New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RasperryPi3 Wireless communication problem #315
Comments
|
At eProsima we haven't been able to reproduce this. We have tested the following environment: MachinesMachine 1: Raspberry Pi 3 Model B Plus Rev 1.3 with Ubuntu 18.04 and ROS 2 Dashing from binaries
Installed following this guide
Machine 2: x86 Laptop (Dell XPS 15 9560) with Ubuntu 18.04 and ROS 2 Dashing from binaries
ROS2 installed following official documentation
Machine 3: x86 Laptop (Dell XPS 15 9560) with Windows 10 and ROS 2 Dashing from binaries
ROS2 installed following official documentation
Network connectionRouter model: Asus RT-N12E Basic testCommands runFirst run the subscribers on the three machines Then open another terminal and start a publisher on each machine. ResultsWhenever a publisher was started, all subscribers started receiving messages (first index was either Additional test 1On RPi terminal 1, repeatResults
Additional test 2On Ubuntu laptop, terminal 2, repeat:ResultsMost of the time it starts receiving messages after less than 5 seconds. Additional test 3repeatResultsRPi starts receiving messages after less than 2 seconds. Additional test 4repeatResults
ConclusionSometimes the multicast message informing of the presence of a participant is lost, and communication is restored when the periodical one is resent Additional remarksThe only multicast traffic on the network was on the participant discovery IP address 239.255.0.1 |
|
We are seeing quite different results on our network (TP-Link Archer C4000). We just re-ran a few experiments after updating to the latest ROS 2 Dashing packages. All these experiments are with ExperimentsExperiment 1 (FastRTPS)Machine 1: RPi3 connected over WiFi Experiment 2 (FastRTPS)Machine 1: RPi3 connected over WiFi Experiment 3 (Cyclone)Machine 1: RPi3 connected over WiFi Experiment 4 (Cyclone)Machine 1: RPi3 connected over WiFi ConclusionFor unknown reasons, FastRTPS does not seem to reliably discover and send messages on our network. However, on the same machines and network infrastructure, CycloneDDS appears to reliably discover and send messages. |
|
@codebot Could you update your comment sharing the RPi3 model (as seen at the beginning of |
|
Hi @MiguelCompany! here are the outputs requested, |
|
@aaronchongth Thank you for posting this. I've seen I am using model 3B+, so I've taken a 3B to repeat the tests with it. I had to upgrade it in order to match your kernel version, and after reboot I had error messages from the firmware, which I fixed following the instructions of this post. I am not saying that your problem is related to this, just putting it here so other users know of this issue. I will repeat the tests and keep you posted here. |
|
We repeated the tests on Thursday and Friday, using a |
|
It's interesting that the problems don't seem to show up in your testing. Here is another report with what I'm speculating is a similar root cause: If you turn off IGMP Snooping, does discovery have problems? Our router has IGMP Snooping turned off by default. |
I tested both with IGMP Snooping on and off, and noticed no difference. The performance of multicast over WiFi depends also on the number of devices connected to the same AP, and is usually bad if different connection kinds (2.4GHz vs 5GHz) are involved. Nevertheless, turning IGMP snooping on usually improves multicast communications performance and is usually recommended. Regarding this, I checked the network traffic using cyclone and Fast-RTPS, and I noticed that cyclone is announcing the participant on multicast, but only with unicast locators on its DATA(p). This strategy may improve the discovery when only two participants are involved but will imply a lot of network traffic if several participants are started at the same time. |
|
In order to reduce multicast traffic, the following |
|
Good news, we could reproduce the problem and have a solution. We bought the same router @codebot was using on its report. Using that router (TP-Link Archer C4000) we did reproduce the issue. Looking at wireshark captures we saw that multicast traffic from the RPi-3 to the Wired-connected PC was not received. So while preparing countermeasures to reduce the multicast traffic to the minimum and improve discovery timing, we did a lot of tests regarding the multicast problem on the router. After all our tests, we found that when a device on the WLAN is both sending and receiving on the same multicast address, a device on LAN subscribed to that same address does not receive that multicast traffic. The same happens if the second device is connected to a different band of WLAN. An easy way to reproduce the problem: In order to address this, we've developed several improvements and features directly on branch 1.8.x of Fast-RTPS, in order to make it as compatible as possible with ROS2 dashing. The changes break the ABI, so when checking the patch rmw_fastrtps repo has to be recompiled. Using this repos file we built and tested on Win10 & Ubuntu PCs and a RPi-3 with Ubuntu. @dirk-thomas Would it be possible to perform a patch release of Dashing, changing the version of Fast-RTPS? We (both eProsima and Morgan) think this bug is really important to be addressed quickly. |
|
The referenced PR updated the Dashing repos file to the provided commit hash of FastRTPS. I also create a note on the project board to do a new release before the next sync: https://github.com/orgs/ros2/projects/12 |
|
@MiguelCompany Have these changes made it onto the master branch? The code has changed somewhat significantly, but it looks like master is still using the old values: https://github.com/eProsima/Fast-RTPS/blob/9d562024886f4e1e7be363356ab032ecba934490/include/fastrtps/rtps/attributes/RTPSParticipantAttributes.h#L187 |
|
@clalancette We gave priority on fixing this for current users, and made the necessary changes on 1.8.x directly. We are getting this into 1.9.x through eProsima/Fast-DDS#744 and will cherry pick from there to master |
|
@clalancette The changes made their way to master on eProsima/Fast-DDS#760. I think this can be closed. |
|
Sounds good, thanks. Will close it out. |
|
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros2-default-behavior-wifi/13460/2 |
Note: This was initially reported by @codebot by e-mail to eProsima. It is posted here following a suggestion by @dirk-thomas
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
Strings published from Machine 1 start printing on the console of Machine 2, after allowing a few seconds for discovery and connection.
Actual behavior
Usually, nothing is printed on the console of Machine 2. If left to run for a long time, sometimes after ~3 to ~5 minutes a few strings will print to the console intermittently, but usually (probably 90-99% of the time) it is not receiving messages. I don't know if this is due to WiFi jitter/latency/packet-drops on the RPi3, or the relatively slow CPU and I/O on the RPi3 causing some timeouts to be missed, or what's going on exactly. This behavior appears "sometimes" on laptops on WiFi, but it is more hit-and-miss. With the RPi3 it's typically much easier to reproduce the issue.
Additional information
Router model (in case it matters): TP-Link Archer C4000
Machine 1: connected via (wired) Gigabit Ethernet
Machine 2: connected via the RPi3 built-in WiFi, which is not super awesome
The text was updated successfully, but these errors were encountered: