Improve mGPU Gaussian tile intersection by matthewdcong · Pull Request #664 · openvdb/fvdb-core

matthewdcong · 2026-05-30T15:20:30Z

Previous iterations of mGPU Gaussian tile intersection include:

Distribute every step of the single mGPU tile sort. Compute the tile intersections for all Gaussians across all tiles into a single tensor, followed by a parallel mGPU radix sort. Requires significantly more communication and synchronization during the parallel radix sort, as well as a temp output key and value array for cross-device merging. (previously in main)
Observe that the radix sort is independent for each camera and compute the radix sort for each camera entirely on a single device. This works well when batch size = num GPUs, but performs poorly when batch size = 1 and num GPUs > 1 because only one GPU is used and all data must be gathered to this GPU. (currently in main)

This PR introduces a more performant strategy. First, we assign each GPU a subset of the tiles/the tile range that will be rendered by that GPU. Then, on each GPU, we compute the intersections of all Gaussians with only that subset of the tiles/the tile range. Since the tile keys are monotonically increasing, this means that the subsequent sorting process is decoupled, i.e. we can sort the per-GPU Gaussian tile intersection lists independently and the resulting flattened array is guaranteed to be sorted. This significantly reduces the amount of communication and data transfer required during the sorting process. Moreover, a switch from radix sort to merge sort enables us to remove the temp output buffers, further reducing stalls due to prefetching as well as decreasing peak memory utilization.

This is a performance improvement across the board, but becomes more significant as the number of GPUs increases. On 8x A100s, this improves end-to-end reconstruction performance about 15% with a batch size of 1 (on a relatively small problem).

Signed-off-by: Matthew Cong <mcong@nvidia.com>

swahtz

One super minor comment not worth blocking for but other than that, it looks great to me! Thanks

Co-authored-by: Jonathan Swartz <jonathan@jswartz.info> Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong added 2 commits May 30, 2026 08:07

Pre-partition intersections in mGPU Gaussian-tile sort

04415be

Signed-off-by: Matthew Cong <mcong@nvidia.com>

Store device totals in pinned memory

890a32d

Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong requested a review from a team as a code owner May 30, 2026 15:20

matthewdcong requested review from blackencino and sifakis May 30, 2026 15:20

matthewdcong added 2 commits May 30, 2026 16:57

Fix template deduction

dd2d6ff

Signed-off-by: Matthew Cong <mcong@nvidia.com>

Fix sync call

94c5143

Signed-off-by: Matthew Cong <mcong@nvidia.com>

swahtz added optimization Performance or memory optimization Gaussian Splatting Issues related to Gaussian splattng in the core library labels Jun 3, 2026

swahtz added this to the v0.5 milestone Jun 3, 2026

swahtz reviewed Jun 3, 2026

View reviewed changes

Comment thread src/fvdb/detail/ops/gsplat/IntersectGaussianTiles.cu Outdated

swahtz approved these changes Jun 3, 2026

View reviewed changes

matthewdcong enabled auto-merge (squash) June 3, 2026 15:21

Update src/fvdb/detail/ops/gsplat/IntersectGaussianTiles.cu

3458183

Co-authored-by: Jonathan Swartz <jonathan@jswartz.info> Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong force-pushed the decoupled_mgpu_sort branch from d166135 to 3458183 Compare June 3, 2026 15:25

matthewdcong merged commit 8a26163 into openvdb:main Jun 3, 2026
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve mGPU Gaussian tile intersection#664

Improve mGPU Gaussian tile intersection#664
matthewdcong merged 5 commits into
openvdb:mainfrom
matthewdcong:decoupled_mgpu_sort

matthewdcong commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

swahtz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

matthewdcong commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

swahtz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthewdcong commented May 30, 2026 •

edited

Loading