Skip to content

Improve/optimize mGPU scaling via batched prefetching and sorting changes#499

Merged
matthewdcong merged 22 commits into
openvdb:mainfrom
matthewdcong:mgpu_multibatch
Mar 5, 2026
Merged

Improve/optimize mGPU scaling via batched prefetching and sorting changes#499
matthewdcong merged 22 commits into
openvdb:mainfrom
matthewdcong:mgpu_multibatch

Conversation

@matthewdcong

@matthewdcong matthewdcong commented Mar 4, 2026

Copy link
Copy Markdown
Contributor
  1. Coalesce consecutive cudaMemPrefetchAsync calls into a single cudaMemPrefetchBatchAsync call in order to amortize OS overhead
  2. Decouple the mGPU radix sort into trivially parallel per-camera radix sorts. Effectively, the data corresponding to each batch can be sorted independently on each GPU instead of one large mGPU radix sort.
  3. Switch from DeviceRadixSort to DeviceMergeSort for mGPU. In mGPU, the performance advantage of radix sort is outweighed by the additional cost of allocating separate input and output buffers.

@matthewdcong matthewdcong requested a review from a team as a code owner March 4, 2026 22:05
@matthewdcong matthewdcong requested review from harrism and sifakis March 4, 2026 22:05
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@harrism harrism added enhancement New feature or request optimization Performance or memory optimization Gaussian Splatting Issues related to Gaussian splattng in the core library labels Mar 4, 2026

@harrism harrism left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'd like you to document the two new utility functions before merging.

Comment thread src/fvdb/detail/ops/gsplat/FusedSSIM.cu
Comment thread src/fvdb/detail/ops/gsplat/GaussianRasterizeBackward.cu Outdated
Comment thread src/fvdb/detail/ops/gsplat/GaussianUtils.h
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong matthewdcong enabled auto-merge (squash) March 5, 2026 06:12
@matthewdcong matthewdcong merged commit 366a74a into openvdb:main Mar 5, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Gaussian Splatting Issues related to Gaussian splattng in the core library optimization Performance or memory optimization

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants