RGB Pointcloud slows down the publishing rate a lot #56

RoseFlunder · 2019-08-22T22:46:27Z

Describe the bug
When enabling the rgb_point_cloud the performance of the node degrades a lot. Without the colorized pointcloud the node can maintain a steady 30 hz publishing rate when using 30 FPS with the kinect.
With the rgb_point_cloud enabled the publishing rate decreases to like 10-12 Hz.
When visualizing the point cloud in RViz the slow publishing rate leds to a laggy view.
RViz itself maintains high enough FPS displaying the pointcloud but the data comes in too slow.
When using the k4aviewer there is no slowdown in colorized point cloud view.

To Reproduce
Steps to reproduce the behavior:

Start the node via the driver.launch and use fps:=30, depth_mode:=NFOV_UNBINNED, rgb_point_cloud:=false
Call "rostopic hz points2" in another terminal
Restart the node with rgb_point_cloud:=true
Call "rostopic hz points2" again and compare the rate with step 2

Expected behavior
Maintain the 30 FPS rate for publishing the rgb pointcloud on machines that can easily view the colorized point cloud at 30 FPS in the k4aviewer.

Desktop (please complete the following information):

OS: Ubuntu 18.04

skalldri · 2019-08-23T17:37:10Z

Confirmed this is occurring on my ROS Melodic, Ubuntu 18.04 machine. I suspect the main image processing loop is running too slowly on the CPU, since the K4A Viewer does the RGB point cloud math on the GPU.

I'll investigate ways I could do this faster, potentially using SIMD instructions.

skalldri · 2019-08-23T18:50:56Z

Confirmed, the image processing loop take between 40-70ms when the RGB point cloud is enabled. I'll look for ways to potentially speed it up.

RoseFlunder · 2019-08-26T21:59:37Z

@skalldri
Is a GPU accelerated version planned?
In the usage.md this is stated but there isn't any GPU acceleration in the currently used SDK methods I think?

Using the point cloud functions in this node will provide GPU-accelerated results, but the same quality point clouds can be produced using standard ROS tools.

Today I saw this example in the sensor sdk repository:
Fast Pointcloud Example
Would this version already be faster even on CPU or is it comparable?
I guess this algorithm is implemented on the GPU for the k4aviewer isn't it?

Tomorrow I am planning to use the CPU version of this algorithm in the node to check its performance compared to the current version.
If I get it working I will create branch in my fork for it.

skalldri · 2019-08-26T22:28:02Z

The fast point-cloud example does not produce RGB point clouds, just regular point clouds. It's just a way to produce point clouds quickly without using SSE3.

My documentation in usage.md is incorrect: we use SSE3-accelerated point cloud math, not GPU-accelerated. See:

https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/c0800a43fb42200935aa9a8da58b8bd5db3d55c9/src/transformation/rgbz.c#L1210

Yes: K4A Viewer does this all on the GPU. Since K4A viewer is going to be drawing everything to the screen as the final destination (rather than writing it to a ROS message), it makes sense for it to do as much work as possible in a shader. They wrote a shader that accepts a depth image and color image as input, as well as the X-Y tables, and then naively draws the RGB point cloud. That's a very fast operation so it runs super quickly.

Copying the point cloud and RGB frame to the GPU, doing some work, then copying it back off the GPU is going to take time. I'd like to take a stab at improving the speed of the CPU-based version before resorting to

RoseFlunder · 2019-08-26T22:37:43Z

Thanks for the detailed answer :) I am looking forward to it because thats one of our main blocking points at the moment in our project.

RoseFlunder · 2019-09-12T00:40:04Z

@skalldri
So like I already mentioned in another issue I did some time measuring.
In this gist I just pasted the modified part of the rgb point cloud method together with the output.
I don't know if ROS Time is very accurate but we can clearly see that the SDKs methods are fine performance wise.
Whats really dragging down is the loop that creates the message:
https://gist.github.com/RoseFlunder/0cca13211022049ae7abc98ea600a7de

EDIT:
Building the package in release mode drastically speeds this loop up.
Use: catkin_make -DCMAKE_BUILD_TYPE=Release
The loop is about 3 times faster with this (~14ms instead of ~45ms in my example).
I added the output for this built also to the gist.

EDIT2:
Just did another test what happens when replacing local variables, like BgraPixel and depth_point_3d, with only expressions from the buffer and putting the divide by 1000 part also in the expression.
That gives about 1ms on average. So the local variables for better readability aren't a big deal.

skalldri · 2019-09-12T18:00:57Z

Thanks for digging into this issue RoseFlunder. I suspect the iterator used to fill up the PointCloud2 message is very expensive. There is almost certainly a faster way to copy the data from the SDK structures into the PointCloud2 message.

Unfortunately, I'm on vacation until the 25th so I won't be able to make much progress on this until I get back.

skalldri · 2019-10-10T21:33:58Z

Going to start having a look at this.

skalldri · 2019-10-22T03:05:08Z

EDIT:

Never mind, the original code has a simple mistake: not calling pcl_modifier.resize() before trying to modify the point cloud! Using the sped-up branch helps, but still doesn't get us over the finish line. Digging further into Valgrind on the latest code.

ORIGINAL:

Well, Valgrind seems to think that using PointCloud2Iterator is very expensive:

Combined, PointCloud2Iterator::operator++ and PointCloud2Iterator::operator* account for ~20% of the time spent in getRgbPointCloud()! While not confirmed, this makes me think that we could see significant speedups if we do this the old fashioned way.

I'll see if I can build a version of the point cloud functions that doesn't use the iterator.

skalldri · 2019-10-22T03:07:13Z

It's good to see that this is using memset_avx2_erms: that's probably the compiler optimizing the PointCloud2 alloc.

emmanuel-senft · 2020-01-10T20:23:44Z

Hi, I am also affected by this bug, and I was wondering if there had been any progress on that side?

Using fillColorPointCloud is too slow. There is an open bug microsoft#56 about it. Instead, this publish on the xyz values in the camera frame which is what we need. The nb of points is equal to h*w of the raw_image and can be easily recombined if needed later on.

anastasiabolotnikova · 2020-04-13T15:45:23Z

Hello, glad to know that this issue is getting attention and already started to be investigated. Thanks @RoseFlunder @skalldri for sharing your findings!

This is indeed a limitation of the current driver implementation that should be addressed to speed up the loop cycle. Both fillColorPointCloud and fillPointCloud could be much faster if going over the entire point cloud sequentially point by point is somehow avoided.

Three questions that I would like to ask regarding this issue:

Is there no way of converting 3D points coordinates data from K4A SDK structure (e.g. buffer) into sensor_msgs::PointCloud2.data binary blob using memcpy? I suppose that would be the fastest way to deal with it on machines without GPU?
Why new PointCloud2 is being created and resized on every cycle? I suppose it only needs to be created and set to correct size once before the while loop of framePublisherThread?
A parallel implementation of fillColorPointCloud and fillPointCloud for machines with GPU would be great! Especially if there is no way to copy points from SDK structure into ROS message using memcpy. Could you share with us any information regarding the plans of this implementation?

Thank you in advance for the reply!

ooeygui · 2020-04-13T16:41:06Z

Hi All,
We Azure Kinect to address a few performance related issues, as well as Migrating to ROS2 on our backlog, but likely won't be able to start working on it until the summer.

YoshuaNava · 2020-10-19T14:03:42Z

@skalldri I'm also having issues with the PointCloud2Iterator, although I'm not a ser of Kinect Azure. Do you still plan to optimize it? I would be happy to collaborate and find out better ways to use it, or come up with improvements.

ooeygui · 2020-10-19T18:21:14Z

Hi @YoshuaNava ,
This work is still on the backlog.

The code currently does this work linearly in C++

Azure_Kinect_ROS_Driver/src/k4a_ros_device.cpp

Line 586 in c8d4cb7

    
           k4a_result_t K4AROSDevice::fillColorPointCloud(const k4a::image& pointcloud_image, const k4a::image& color_image,

.

We are looking at Cuda for this codepath when available.

YoshuaNava · 2020-10-27T12:10:44Z

@ooeygui Thank you for your prompt answer.

I see. I have continued using the iterator. Until now it allowed me to speed up an old raw deserialization method by 3-5x, which is a good gain with considerable low effort.

I'll share any insights found while I work on this.

ibrachahine · 2022-03-11T09:26:05Z

Hi, I am prototyping for an application requiring the depth topic on ROS and it seems to be very slow at 1Hz. Is there any recent updates on this issue?

Thanks

ooeygui · 2022-03-11T19:01:00Z

@ibrachahine Thank you for your interest and ask here. We have not invested in fixing this bug, but would accept a pull request. Have you tried workign with parameters on the node to optimize for your hardware and scenario?

RoseFlunder added bug Something isn't working triage needed The Issue still needs to be reviewed by the Azure Kinect ROS Driver Team labels Aug 22, 2019

RoseFlunder mentioned this issue Aug 24, 2019

Object Boundary "Leak" in RGB-D Image and PointCloud #63

Closed

skalldri mentioned this issue Aug 26, 2019

Warn if the main sensor loop falls behind realtime #64

Closed

skalldri added help wanted Extra attention is needed triage approved The Issue has been reviewed and approved for development by the Azure Kinect ROS Driver Team and removed triage needed The Issue still needs to be reviewed by the Azure Kinect ROS Driver Team labels Aug 26, 2019

skalldri mentioned this issue Sep 12, 2019

"process has died" when roslaunch driver.launch #70

Closed

skalldri mentioned this issue Oct 10, 2019

Increase parallelism of sensor data processing #32

Closed

skalldri self-assigned this Oct 10, 2019

skalldri removed the help wanted Extra attention is needed label Oct 10, 2019

RoseFlunder mentioned this issue Oct 17, 2019

Allow rescaling the raw IR image for use in visual odometry, swaps to cv_bridge #91

Merged

5 tasks

helenol mentioned this issue Oct 18, 2019

Speed up pointcloud output by colorizing depth images instead of depthifying color images #92

Merged

5 tasks

ooeygui added the help wanted Extra attention is needed label Jan 10, 2020

YoshuaNava mentioned this issue Oct 19, 2020

[sensor_msgs/PointCloud2Iterator] Iterator code does not get vectorized when used in loops ros2/common_interfaces#133

Closed

fdila mentioned this issue May 27, 2024

Lentezza generale driver iralabdisco/Azure_Kinect_ROS_Driver#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RGB Pointcloud slows down the publishing rate a lot #56

RGB Pointcloud slows down the publishing rate a lot #56

RoseFlunder commented Aug 22, 2019

skalldri commented Aug 23, 2019

skalldri commented Aug 23, 2019

RoseFlunder commented Aug 26, 2019 •

edited

Loading

skalldri commented Aug 26, 2019

RoseFlunder commented Aug 26, 2019

RoseFlunder commented Sep 12, 2019 •

edited

Loading

skalldri commented Sep 12, 2019

skalldri commented Oct 10, 2019

skalldri commented Oct 22, 2019 •

edited

Loading

skalldri commented Oct 22, 2019

emmanuel-senft commented Jan 10, 2020

anastasiabolotnikova commented Apr 13, 2020

ooeygui commented Apr 13, 2020

YoshuaNava commented Oct 19, 2020 •

edited

Loading

ooeygui commented Oct 19, 2020

YoshuaNava commented Oct 27, 2020

ibrachahine commented Mar 11, 2022

ooeygui commented Mar 11, 2022

RGB Pointcloud slows down the publishing rate a lot #56

RGB Pointcloud slows down the publishing rate a lot #56

Comments

RoseFlunder commented Aug 22, 2019

skalldri commented Aug 23, 2019

skalldri commented Aug 23, 2019

RoseFlunder commented Aug 26, 2019 • edited Loading

skalldri commented Aug 26, 2019

RoseFlunder commented Aug 26, 2019

RoseFlunder commented Sep 12, 2019 • edited Loading

skalldri commented Sep 12, 2019

skalldri commented Oct 10, 2019

skalldri commented Oct 22, 2019 • edited Loading

skalldri commented Oct 22, 2019

emmanuel-senft commented Jan 10, 2020

anastasiabolotnikova commented Apr 13, 2020

ooeygui commented Apr 13, 2020

YoshuaNava commented Oct 19, 2020 • edited Loading

ooeygui commented Oct 19, 2020

YoshuaNava commented Oct 27, 2020

ibrachahine commented Mar 11, 2022

ooeygui commented Mar 11, 2022

RoseFlunder commented Aug 26, 2019 •

edited

Loading

RoseFlunder commented Sep 12, 2019 •

edited

Loading

skalldri commented Oct 22, 2019 •

edited

Loading

YoshuaNava commented Oct 19, 2020 •

edited

Loading