Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDTree creation takes much time for big clouds #60

Open
Alevs2R opened this issue Jun 22, 2021 · 1 comment
Open

KDTree creation takes much time for big clouds #60

Alevs2R opened this issue Jun 22, 2021 · 1 comment

Comments

@Alevs2R
Copy link
Contributor

Alevs2R commented Jun 22, 2021

Problem

I discovered that KDtree creation takes significantly more time when aligning with a big target cloud (with a map).

KDtree creation takes only 3ms for clouds with 17k points, but it takes 90ms for 370k points (my small 100x100 meters filtered map).

PCL creates kdtree inside "align" method, so it means that ndt_cuda (P2D) runs for 98ms instead of 8ms during first run.

Moreover, PCL spends 20-30ms to calculate fitness score, regardless of the size of the target map. It is also not very good 😁.

Detailed benchmark results

Aligning big target cloud with kdtree creation

target:364444[pts] source:18564[pts]
--- fgicp_st ---
single:2293.17[msec] 
--- fgicp_mt ---
single:460.209[msec] 
--- vgicp_st ---
single:2132.44[msec] 
--- vgicp_mt ---
single:381.7[msec] 
--- ndt_cuda (P2D) ---
single:98.3882[msec] 
--- ndt_cuda (D2D) ---
single:93.0577[msec] 
--- vgicp_cuda (parallel_kdtree) ---
single:340.319[msec] 
--- vgicp_cuda (gpu_bruteforce) ---
single:9404.37[msec] 
--- vgicp_cuda (gpu_rbf_kernel) ---
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: cudaErrorMemoryAllocation: out of memory
Aborted (core dumped)

Aligning big target cloud without kdtree creation

target:364444[pts] source:18564[pts]
--- fgicp_st ---
single:2224.49[msec] 
--- fgicp_mt ---
single:312.474[msec] 
--- vgicp_st ---
single:2044.39[msec] 
--- vgicp_mt ---
single:293.93[msec] 
--- ndt_cuda (P2D) ---
single:8.7521[msec] 
--- ndt_cuda (D2D) ---
single:7.81053[msec] 
--- vgicp_cuda (parallel_kdtree) ---
single:246.274[msec] 
--- vgicp_cuda (gpu_bruteforce) ---
single:9536.63[msec] 
--- vgicp_cuda (gpu_rbf_kernel) ---
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: cudaErrorMemoryAllocation: out of memory
Aborted (core dumped)

Aligning default clouds with kdtree creation

--- fgicp_st ---
single:263.107[msec] 100times:26208.3[msec] 100times_reuse:17901.3[msec] fitness_score:0.204374
--- fgicp_mt ---
single:53.8672[msec] 100times:3846.4[msec] 100times_reuse:2469.65[msec] fitness_score:0.204384
--- vgicp_st ---
single:185.224[msec] 100times:18208.6[msec] 100times_reuse:9911.83[msec] fitness_score:0.205022
--- vgicp_mt ---
single:54.0197[msec] 100times:2933.43[msec] 100times_reuse:1687.04[msec] fitness_score:0.205022
--- ndt_cuda (P2D) ---
single:8.74302[msec] 100times:861.372[msec] 100times_reuse:499.426[msec] fitness_score:0.197207
--- ndt_cuda (D2D) ---
single:6.07843[msec] 100times:587.673[msec] 100times_reuse:315.464[msec] fitness_score:0.198976
--- vgicp_cuda (parallel_kdtree) ---
single:31.5053[msec] 100times:2363.01[msec] 100times_reuse:992.915[msec] fitness_score:0.204697
--- vgicp_cuda (gpu_bruteforce) ---
single:45.3529[msec] 100times:3512.59[msec] 100times_reuse:1685.24[msec] fitness_score:0.204693
--- vgicp_cuda (gpu_rbf_kernel) ---
single:7.57619[msec] 100times:750.764[msec] 100times_reuse:309.824[msec] fitness_score:0.204584

Aligning default clouds without kdtree creation

--- fgicp_st ---
single:260.423[msec] 100times:25678.1[msec] 100times_reuse:17731.4[msec]
--- fgicp_mt ---
single:82.7624[msec] 100times:3546.36[msec] 100times_reuse:2466.41[msec]
--- vgicp_st ---
single:182.533[msec] 100times:17876.4[msec] 100times_reuse:9981.22[msec]
--- vgicp_mt ---
single:36.4721[msec] 100times:2744.13[msec] 100times_reuse:1607.6[msec]
--- ndt_cuda (P2D) ---
single:6.20965[msec] 100times:558.324[msec] 100times_reuse:472.443[msec]
--- ndt_cuda (D2D) ---
single:3.04992[msec] 100times:306.246[msec] 100times_reuse:303.85[msec]
--- vgicp_cuda (parallel_kdtree) ---
single:37.2568[msec] 100times:1809.56[msec] 100times_reuse:1083.63[msec]
--- vgicp_cuda (gpu_bruteforce) ---
single:42.7751[msec] 100times:3270.32[msec] 100times_reuse:1685.56[msec]
--- vgicp_cuda (gpu_rbf_kernel) ---
single:4.78256[msec] 100times:472.186[msec] 100times_reuse:306.657[msec]

Simple solution

My solution is to set empty kdtree as search method and enable force_no_recompute flag before calling "align".
Like this:

pcl::search::KdTree<pcl::PointXYZ>::Ptr empty_kdtree(new pcl::search::KdTree<pcl::PointXYZ>);
reg.setSearchMethodTarget(empty_kdtree, true);

You can easily repeat my experiment in fork, branch test/disable-kdtree. Command:

build/gicp_align data/saved_map_cloud.pcd data/saved_scan.pcd

Better solution

Better solution would be to replace PCL default kdtree with something more efficient 😊

@koide3
Copy link
Owner

koide3 commented Jun 26, 2021

Thanks for letting me know this issue and providing nice benchmark results. When I wrote the code, I noticed that pcl::Registration creates kdtree inside, but I thought its construction cost would be negligible. I'll disable the kdtree creation with force_no_recompute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants