Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lock free ring buffer with throttling [HUMBLE/IRON] #321

Merged
merged 3 commits into from
May 28, 2024

Conversation

Samahu
Copy link
Contributor

@Samahu Samahu commented Apr 16, 2024

Related Issues & PRs

Problem Analysis

  • When adding multiple subscribers or when having a bad network or low performing system, LidarScan composition and processing could eventually take more time, this leads to a build up in the packets buffer which will eventually be exhausted and the current mechanism of the thread_safe_ring_buffer::write_overwrite made things only worse. The solution can be in one of two things either ensure that the LidarScan composition + processing takes less time than the time needed to add new packets, or decouple packet addition from LidarScan consumption, if the consumption thread can't keep up then throttle.

Summary of Changes

  • Implement lock free ring buffer with unit tests
  • Decouple packet reception from packet processing
  • Throttle the ring buffer as we reach 70% capacity
  • Drop packets only when the ring buffer has been completely exhausted

TODO(s):

  • Replace the LockFreeRingBuffer implementation with ThreadSafeRingBuffer within os_sensor_nodelet.
    • Alternatively, forward the packets directly to LidarScan without a staging buffers.

Validation

  • Run the driver as usual with RIVZ, add multiple subscribers to /ouster/points
  • Verify the problem with missing columns is eliminated or reduced.
  • Observe that the point cloud publish frequency isn't affected significantly

@Samahu Samahu added the enhancement New feature or request label Apr 16, 2024
@Samahu Samahu self-assigned this Apr 16, 2024
@Samahu Samahu changed the base branch from master to ros2 April 16, 2024 15:01
@Samahu Samahu changed the title WIP: Implement lock free ring buffer with throttling [humble/iron] WIP: Implement lock free ring buffer with throttling [HUMBLE/IRON] Apr 16, 2024
@Samahu Samahu changed the title WIP: Implement lock free ring buffer with throttling [HUMBLE/IRON] Implement lock free ring buffer with throttling [HUMBLE/IRON] Apr 16, 2024
@TillBeemelmanns
Copy link

TillBeemelmanns commented Apr 23, 2024

We tested both, this branch and this attempt #302 to solve the problem with missing parts of the point cloud. We set the recv buffer size to the following values

sudo sysctl -w net.core.rmem_default=26214400
sudo sysctl -w net.core.rmem_max=26214400

on the host machine and also set proc_mask:="PCL". However in all cases we could observe partial missing point clouds as soon as we add a second subscriber node to the driver. That applied to resolutions 1024x20, 2048x10, 4096x5, for the other resolutions 512x10, 512x20, 1024x10 we could observed a significant delay in the point cloud without visible package loss.

For all resolutions, the point cloud looks fine if only RVIZ is used for visualization (single subscriber).
The host is a high performance workstation with 64 cores and 128 GB RAM.

@marioney
Copy link

We also tested this PR with an OS-1-128 Rev 7 and it fixed the missing points issue for us, running ROS Humble in Ubuntu 22.04.

We tested both 1024x10 and 1024x20 modes and the issue was not present in any of those.

@Samahu
Copy link
Contributor Author

Samahu commented Apr 24, 2024

@TillBeemelmanns @marioney thank you both for the feedback.

@TillBeemelmanns sorry that didn't help with your situation .. I still have one TODO item that I predict it to help with the delay and might improve things overall. I should have this PR completed and merged by next week.

@mkiael
Copy link

mkiael commented May 27, 2024

Tested on OS0-128 Rev C running ROS Iron on Ubuntu 22.04. Solves issue with missing data.

Great work!

@Samahu
Copy link
Contributor Author

Samahu commented May 28, 2024

Thanks @mkiael for the feedback. I will proceed with the PR as is since it resolves the issue for most of the driver users and will address the rest of suggested improvements in a separate PR.

@Samahu Samahu added the bug Something isn't working label May 28, 2024
@Samahu Samahu marked this pull request as ready for review May 28, 2024 16:26
@Samahu Samahu merged commit 81b425a into ros2 May 28, 2024
3 checks passed
@Samahu Samahu deleted the ros2-implement_lock_free_ring_buffer_with_throttling branch May 28, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parts of point cloud missing Missing lidar chunk os1-128-u
4 participants