Skip to content

Optimize Gaussian tile intersection for mGPU#446

Merged
matthewdcong merged 1 commit into
openvdb:mainfrom
matthewdcong:mgpu_isect_prefetch
Feb 9, 2026
Merged

Optimize Gaussian tile intersection for mGPU#446
matthewdcong merged 1 commit into
openvdb:mainfrom
matthewdcong:mgpu_isect_prefetch

Conversation

@matthewdcong

@matthewdcong matthewdcong commented Feb 6, 2026

Copy link
Copy Markdown
Contributor
  1. Using tilesPerGaussianCumsum, we can exactly prefetch the range of intersection keys and values needed for the subsequent computeGaussianTileIntersections kernel. This significantly improves the performance and variance in execution time of the kernel, going from 15 to 30ms to consistently 3ms. This results in an end to end performance increase of about 7-8%.
  2. The overhead of the prefetch when merging keys in the multi-GPU radix sort was larger than the penalty occurred for (rare) page faults. Removing the prefetch marginally increases performance.
  3. Some small const fixes

Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong matthewdcong requested a review from a team as a code owner February 6, 2026 18:38

@swahtz swahtz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for that.

@matthewdcong matthewdcong merged commit 542c6a1 into openvdb:main Feb 9, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants