-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]Stream and the best practice of thrust STL usage. #71
Comments
Thanks! Switching to async::copy may speed up the process. |
Thanks. AFAK, async::copy is not supported in thrust, but From NVIDIA/thrust#827 (comment) it seems that most of algorithms of thrust would be blocking except for From NVIDIA/thrust#827 (comment) pinned_memory would also help to get async copy, but pinned_memory is not a good option for iGPU devices. |
I used The calculation time was also faster using async::copy. import time
import cupoch as cph
if __name__ == "__main__":
print("Load a ply point cloud, print it, and render it")
pcd = cph.io.read_point_cloud("../../testdata/icp/cloud_bin_2.pcd")
cph.visualization.draw_geometries([pcd])
start = time.time()
for _ in range(100):
uni_down_pcd = pcd.uniform_down_sample(every_k_points=5)
print(time.time() - start)
cph.visualization.draw_geometries([uni_down_pcd])
# sync: 0.025475502014160156
# async: 0.018369436264038086 |
I have checked with https://github.com/NVIDIA/thrust/blob/main/CHANGELOG.md , This policy can be easily applied to all codes that uses these algorithms. thrust::async::reduce.
thrust::async::reduce_into, which takes a target location to store the reduction result into.
thrust::async::copy, including a two-policy overload that allows explicit cross system copies which execution policy properties can be attached to.
thrust::async::transform.
thrust::async::for_each.
thrust::async::stable_sort.
thrust::async::sort. Great! for jetpack 4.4.1, the version is 1.9.7-1 CUDA Toolkit 10.2 for Tegra |
cuda_stream is about concurrency of kernel functions.
According to https://github.com/neka-nat/cupoch/blob/master/src/cupoch/geometry/pointcloud.h, each
pointcloud
has three vectors that are points, normals and colors. Usage of stream can be applied topointcloud
.But I found that not all functions apply this policy.
For example, passthroughFilter uses only the default stream, and downsample function uses 3 streams to perform operations in each vector.
see
cupoch/src/cupoch/geometry/pointcloud.cu
Line 382 in 9b4859f
cupoch/src/cupoch/geometry/down_sample.cu
Line 251 in 9b4859f
@neka-nat Could you explain the reason why you choose to do so?
Or are there any drawbacks to use streams in cuda?
Thanks.
The text was updated successfully, but these errors were encountered: