Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving jitter through the driver and middleware stack for data processors #41

Merged
merged 3 commits into from
May 1, 2020
Merged

Conversation

tpanzarella
Copy link
Member

In both the pointcloud_processor and scan_processor (the latter is really insignificant) this patch has the driver give up ownership of the message at the time of data publishing. This allows for middleware optimizations to include the elimination of unnecessary copies. Several tests on my machine show order of magnitude jitter improvements through the driver + middleware stack both out-of-process and while running as a component to sub-millisecond levels. Here are some data:

NOTE: msg_stamp here is the time stamped on the message by the driver (so, how consistent is the LiDAR at delivering the data). recv_stamp here is the time received by a subscriber (so, how long spent in the driver and through the ROS2 middleware before a subscriber can work with the data). I'm using TIME_FROM_ROS_RECEPTION as my timestamp mode so I can truly measure time spent in the driver and ROS2 middleware and not worry about in-LiDAR or network latency.

Before this patch, here is what the data looked like for the pointcloud processor (these are for interprocess communication tests over 1,000 messages -- the effect is similar when run as a component):

test-case-5_1024x10_raw_latency

Here is a quantile plot of the same:

test-case-5_1024x10_q_latency

Here are the end-to-end numbers for this same set of data (again, before this patch):

test-case-5_1024x10_e2e_raw_latency

test-case-5_1024x10_e2e_q_latency

Here is what the data look like with this patch:
(Not perfect, there is much more we can do, but significantly improved in terms of jitter)

test-case-6_1024x10_raw_latency

test-case-6_1024x10_q_latency

test-case-6_1024x10_e2e_raw_latency

test-case-6_1024x10_e2e_q_latency

In terms of the end-to-end numbers, prior to this patch, the median end-to-end pointcloud delivery latency was 18.164 ms, however, the median absolute deviation (MAD) was 4.256 ms. With this patch, the median latency was 23.706 ms, but the MAD was only .252 ms. So, the patch significantly lessens the jitter which makes the system more stable. We can always tune to lessen absolute latency. Indeed, in my setup, the DDS could be further tuned.

The driver now gives up ownership of the PointCloud message in the
pointcloud_processor at publish time to allow for middleware optimizations to
include the elimination of unnecessary copies. Several tests on my machine show
order of magnitude jitter improvements through the driver + middleware stack
both out-of-process and while running as a component to sub-millisecond
levels.
@SteveMacenski
Copy link
Member

SteveMacenski commented May 1, 2020

This allows for middleware optimizations to include the elimination of unnecessary copies

Is this for the case of using the component? (edit: reread the post. That's interesting, do you know why this is?) I'm surprised for using the non-real-time rclcpp API that the inter-process comms would be improved so much that way. I'm just trying to fully understand why that is the case so I can apply that to other places in the ROS ecosystem that could benefit. Excellent analysis, thank you.

This looks good, it'll go in as is.

@tpanzarella
Copy link
Member Author

I believe it has to do with cutting down on the number of data copies and allocations once the data leaves the driver. As the code was written prior, the msg was getting passed to publish as a reference. So, for safety, the middleware must make a copy of the data right off the bat and to support that copy there would be an allocation as well. By passing a unique_ptr to publish we are explicitly communicating our ownership intentions to the middleware. Specifically, that we relinquish ownership. I believe, internally, when the middleware takes receipt of this unique_ptr, it immediately promotes it to a shared_ptr (details beyond this comment) but that does not incur the copy or allocation. It's pretty cheap.

I have to play a bit with the intra-process comms. I have not gotten this to give us true zero-copy yet but I haven't looked that hard. I will spend some time with that next week as it is relevant to what we are currently working on.

@SteveMacenski
Copy link
Member

SteveMacenski commented May 1, 2020

That begs the question to me then: why does ROS2 even allow you to publish a non-pointer type? It seems like that's a reasonable burden to put on developers for that much increase in stability.

@tpanzarella
Copy link
Member Author

I certainly like the smart pointers as a matter of taste as ownership semantics are clearly communicated. It is a facility provided to us by the language and we should exploit it. I suppose the same performance could be achieved by casting a non-pointer type to an r-value reference, however, that can silently degrade to an l-value reference and you would not realize it until the performance became a problem. The smart pointers make the intention clear.

@tpanzarella
Copy link
Member Author

I should also note that I have done a bit of tuning on my machine (DDS, UDP buffer sizes, etc.), however, the delta shown in those plots are simply as a result of the change in the way we are publishing through the driver.

@tpanzarella
Copy link
Member Author

Let me know if you want me to handle the merge.

@SteveMacenski
Copy link
Member

@wjwwood maybe something to consider in your API review for rclcpp. I think it would make alot of sense to force the use of smart pointers if this is the kind if impact it can make.

@SteveMacenski SteveMacenski merged commit ed188b3 into ros-drivers:eloquent-devel May 1, 2020
@wjwwood
Copy link
Member

wjwwood commented May 2, 2020

The reason why we allow a reference is to allow the user to have messages on the stack if they want or to reuse a message loop to loop. Because you give up ownership you need to allocate a new one each time.

You could also potentially get a similar performance improvement by using the middleware publisher allocation feature which lets the middleware preallocate resources and reuse them on each publish call rather than needing to allocate resources each publish. Not all the middleware implementations leverage this optimization but they could in the future. @mjcarroll worked on that feature maybe he could point you to an example, I’m on my phone atm and don’t have an example at hand.

For the zerocopy intraprocess you just need to publish and subscribe with unique ptr and ensure there’s only one publisher and subscription pair.

One further thing you could do is use the loaned message api where you can get a message from the middleware, fill it out, and then publish it (returning ownership to the middleware). This can allow the middleware to do zero copy, even Between processes. Iceoryx does this, @Karsten1987 gave a ROSCon talk about this.

@SteveMacenski
Copy link
Member

The reason why we allow a reference is to allow the user to have messages on the stack if they want or to reuse a message loop to loop

Ah, I do remember us talking about that. Thanks for the reminder. Now that I'm in front of a computer, I'm scouring all of the navigation repos for instances where we don't do that to update to work with unique_ptrs at the bare minimum. We play around with too many pointclouds/images/maps to not use that boost.

I appreciate the detailed response. I think I've covered the major bases with tickets where updates are required.

@tpanzarella
Copy link
Member Author

Hey @wjwwood thanks for your comments. Quick point of clarity on the zero-copy strategy...

For the zerocopy intraprocess you just need to publish and subscribe with unique ptr and ensure there’s only one publisher and subscription pair.

This makes sense that zero-copy would work in this scenario. I assume when there is a 1-N (producer->consumer) topology, subscriptions to a std::shared_ptr<const T> should also get the data in true zero copy, no?

Also, something basic related to this that maybe you can help with. Is it a true statement that to get zero-copy the .use_intra_process_comms(true) flag needs to be set on the NodeOptions passed to the Node ctor?

This AM, I wanted to see if I could get this to work correctly. However, I ran into a problem. The driver is implemented as a managed (lifecycle) node. As the driver was coming up and an external maanger was pushing it through its FSM, I got the following exception:

[ouster_driver-1] [ERROR] []: Caught exception in callback for transition 10
[ouster_driver-1] [ERROR] []: Original error: intraprocess communication allowed only with volatile durability
[ouster_driver-1] [WARN] []: Error occurred while doing error handling.
[ouster_driver-1] [ERROR] [ouster_driver]: Handing error in Ouster driver node.

I got the exact same behavior when 1) running the driver standalone (not in a component container) and driving the FSM via launch and 2) using a custom lifecycle manager to drive the FSM when running the driver in a component container (as there is currently a limitation in launch that we cannot define a managed lifecycle component so we need a custom manager for this).

My direct question is: is it possible to do zero-copy / intraprocess comms with managed nodes? If yes, I think there is a fundamental piece I am missing. Any insights you can lend are really appreciated. Pinging @mjcarroll on this too as I think he is pretty deep on these topics as well -- Hi Mike :-)

The one particular line of the error above that has me puzzled is:

[ERROR] []: Original error: intraprocess communication allowed only with volatile durability

I know what it is trying to say, but, is there a way that I can communicate to the middleware something like: "hey, do intraprocess if you can but for things like messages whose QoS preclude it, just take the interprocess approach." All we really care about are zero-copy semantics for the data, those with rclcpp::SensorDataQoS (i.e., among other things: volatile durability).

One further thing you could do is use the loaned message api where you can get a message from the middleware, fill it out, and then publish it (returning ownership to the middleware). This can allow the middleware to do zero copy, even Between processes. Iceoryx does this, @Karsten1987 gave a ROSCon talk about this.

Yes. I am looking forward to getting Iceoryx set up. It gives us the best of both worlds: Zero-copy / low latency data transformation pipelines with 3D LiDAR scans and the loose coupling / modularity of using out-of-process nodes.

@SteveMacenski
Copy link
Member

@tpanzarella your charts are very compelling, I've be interested to see what that looks like with ROS1's Ouster Drivers. I think showing a side by side of "this than that" could be a really good propaganda piece for ROS2 docs / a Box Robotics blog post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants