Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low bandwith when using FastRTPS #81

Closed
vilhjalmur89 opened this issue Dec 15, 2016 · 24 comments
Closed

Low bandwith when using FastRTPS #81

vilhjalmur89 opened this issue Dec 15, 2016 · 24 comments
Labels
more-information-needed Further information is required

Comments

@vilhjalmur89
Copy link

I'm experiencing problems with most messages that are more than a few kilobytes per second when using FastRTPS.

For example, I just built Ros2 from source on Ubuntu 16.04 and tried running showimage in one terminal, and cam2image in another. I can see images from my webcam, but they are delayed and I get around 1 frame per second.

This also happens with downloaded binaries, but it works fine with opensplice.
The talker/listener demo and other low-bandwidth demos seem to work fine.

@wjwwood
Copy link
Member

wjwwood commented Dec 16, 2016

@vilhjalmur89 thanks for the report. We've tried debugging this a few times in the past and we're still iterating with the Fast-RTPS guys to get some default settings that work good in these cases. Can you post what settings (command line arguments) you're using for the cam2image and showimage programs? If you're not using any, can you do -h and try using the settings to lower the resolution? In the past I've experience the case where larger images (over 300kb) do not perform well out-of-the-box, but smaller images work better.

This has to do with the default flow control settings I think (not sure though). You can see more about that discussion here: #36 (comment)

Hopefully we can improve this use case moving forward. Since my experience with opensplice and connext have been pretty good with this demo, I think we just need to zero in on some better default settings for Fast-RTPS. @richiprosima may have other suggestions for you.

@vilhjalmur89
Copy link
Author

@wjwwood I just ran the commands without any arguments, but by reducing the resolution to 160x90 and the frame rate to 10 FPS, I get a stable stream. That is about 400 KB/s (each image is around 40KB), and anything larger results in lagging.

I noticed that when lagging occurs, the publishing node also slows down. Are publishers not asynchronous by default? That seems to be the case with OpenSplice, that large images may be delayed or dropped, but the publisher is unaffected.

@rohbotics
Copy link

I am having a similar issue when trying to run the turtlebot follower demo.

The Astra node publishes VGA resolution floating point depth images at a reasonable rate (based on printfs at publish).

When I start the follower node, which subscribes to the depth images, both the astra node and the follower node slow down to a frame every 7 seconds. (Again timings based on printfs).

@vilhjalmur89
Copy link
Author

vilhjalmur89 commented Dec 20, 2016

I tried running image_pipeline_all_in_one from the beta-binary-release.

image_pipeline_all_in_one

Now the images are passed as pointers between nodes within the same process, but I still get the same results (less than one frame per second).

@dirk-thomas
Copy link
Member

@vilhjalmur89 I had the same experience on Linux when trying image_pipeline_all_in_one with the beta 1.

@richiprosima
Copy link
Contributor

I've tested beta 1 cam2img - showimage and image_pipeline_all_in_one. I used the source code, not binaries. I didn't appreciate any delay. Did anyone tested with source code? Should I test binaries?

The environment I used was:

  • 2 computer, CPU Intel Core i3-3240
  • 2 types of network cards: Realtek RTL8111/8168/8411 and Intel 82574L (lspci output)
  • A logitech webcam

@richiprosima
Copy link
Contributor

Sorry, I've appreciated the delay increasing the resolution of the image. I will check this. Thanks

@richiprosima
Copy link
Contributor

The problem is caused by two facts:

  • Subscribers are configured by default to receive using multicast and unicast. This is an excess of samples in network. I've changed this behavior in eProsima/Fast-DDS@9d655e0. Now subscribers are configured by default to receive using unicast.

  • rmw_fastrtps_cpp uses the default heartbeat period. It is five seconds and is too much to be used in this kind of application. This causes the lags. Currently this configuration is commented. Uncommenting these lines should improve performance.

@dirk-thomas
Copy link
Member

I guess that these changes are the reason why so many new test failure across all platforms came up this morning: http://ci.ros2.org/job/ci_linux/2110/ http://ci.ros2.org/job/ci_osx/1628/ http://ci.ros2.org/job/ci_windows/2122/

@richiprosima
Copy link
Contributor

Sorry. I only checked our CI. I solved it and ran in your Linux environment. I hope these new changes solve your CI jobs.

Thanks @dirk-thomas

@dirk-thomas
Copy link
Member

I reran a few of the nightly jobs and they still fail many tests. I haven't looked further at them yet: http://ci.ros2.org/view/nightly/job/nightly_linux_debug/307/ http://ci.ros2.org/view/nightly/job/nightly_osx_debug/307/ http://ci.ros2.org/view/nightly/job/nightly_win_rel/284/

@richiware
Copy link
Contributor

Removing multicast by default, interoperability with RTI appears to be broken. I was checking and found RTI is using the global user multicast locators that are sent by Fast RTPS participant, and not the locators sent by Fast RTPS reader.
I've updated Fast RTPS master branch. Now Fast RTPS participant doesn't send default user multicast locators. Running ROS2 tests on my Linux system is working.
Could you check with your CI? Thanks.

@dirk-thomas
Copy link
Member

Looks much better: http://ci.ros2.org/job/ci_linux/2132/ http://ci.ros2.org/job/ci_osx/1634/ http://ci.ros2.org/job/ci_windows/2137/ Thank you for the update.

I will merge ros2/ros2#300 then to use master again.

@richiprosima
Copy link
Contributor

Returning to the topic of this issue. Could someone check there are improvements using new changes in Fast RTPS and increasing heartbeat period in rmw_fastrtps_cpp? I expect you see improvements as I saw in my system.

@dirk-thomas
Copy link
Member

I just ran the two separate processes cam2image and showimage and the performance with FastRTPS is still pretty bad for me.

@RameezI
Copy link

RameezI commented Jan 27, 2017

i have noticed that performance is highly dependent on the network, if I am connected to a LAN it works much better. (I am running nodes on the same machine though)

Furthermore, if I am not connected to a network, demos do not start at all. I also tried image_pipeline_all_in_one, same behavior.

@JavierIH
Copy link

JavierIH commented Feb 1, 2017

Hi @rameezl, are you using the master last version of Fast RTPS? I'm looking for what could be producing this issue, could you provide more information about what settings are you using and if you have changed or not the heartbeat period as proposed by @richiprosima?

@RameezI
Copy link

RameezI commented Feb 3, 2017

Hi @JavierIH I just pulled from master branch of Fast RTPS and applied settings proposed by @richiprosima.

I do not see the delay now and its much better, my settings:
cam2image -x 640 -y 480

Also, changing from my local network to WAN does not cause performance degradation anymore but the subscriber still stalls if the network is changed or disconnected in between.

The coupling with the network connection is present even in case of intra-process communication.
The demos do not work in absence of a network connection.

In case of image_pipeline_all_in_one, the delay is lot more almost two seconds!

@clalancette
Copy link
Contributor

I've been having similar problems using the cam2image and showimage test programs in ROS2 with Fast-RTPS. In my case, I was using the default 320x240 resolution for cam2image, and the behavior I saw was to get a bunch of publications, then have a long delay (15-20 seconds), then get another bunch of publications, etc.

I uncommented the lines in rmw_fastrtps that set the heartbeat to 10ms as suggested by @richiprosima , and things are much better. I don't get the queueing behavior, and the data is streaming at 1280x960@30fps (though there is still some lag in the stream). So that looks to improve the situation a lot for me. Is that change something we want to get landed in the code? What's the downside to doing so?

@mikaelarguedas
Copy link
Member

Indeed the value of the heartbeat period does influence a lot the performance according to the application. I'm not sure that changing the default to 10ms for any created topic is the way to go. While it would improve the performance of high throughput-high bandwidth applications, it will flood the network for any kind of large scale system (many nodes) with sporadic or low frequency communication.

We discussed in the past exposing more qos settings through the rmw interface, but haven't decided which ones yet.
A good middle-ground would be not to expose it explicitly but to set a different heartbeat period according to the qos profile used (e.g. use big heartbeat period for default profile but smaller for the sensor_data profile)

Does anyone know what is the heartbeat period of this demo on connext?

@wjwwood
Copy link
Member

wjwwood commented Feb 22, 2017

It seems like Connext may have a different scheme for specifying the heartbeat (at least at a glance):

https://community.rti.com/content/forum-topic/piggyback-heartbeat-period

@richiprosima
Copy link
Contributor

As @wjwwood comments, Connext has two heartbeat mechanisms. Heartbeat period and heartbeat piggyback. The second one introduces an extra heartbeat depending on the batch of data sent. As much data is send, more heartbeats are included.
Piggyback is not implemented yet on Fast RTPS, but it is in our roadmap. Meanwhile the heartbeat period could be decreased.

@mikaelarguedas
Copy link
Member

@vilhjalmur89 Is this still a problem ? Fast-RTPS now provides a piggyback heartbeat and I cannot reproduce this issue as of beta2

@mikaelarguedas mikaelarguedas added the more-information-needed Further information is required label Aug 4, 2017
@dirk-thomas
Copy link
Member

Closing this for now due to no response. Please feel free to comment on the closed ticket and it can be reopened if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

10 participants