Handle duplicate pixels in sparse pixel gaussian rendering by harrism · Pull Request #488 · openvdb/fvdb-core

harrism · 2026-03-03T05:09:04Z

Summary

Deduplicate pixel coordinates at the binding layer (deduplicatePixels) before passing to computeSparseInfo and rasterization kernels, then scatter results back via index_select. This avoids any CUDA kernel modifications while correctly handling duplicate (batchIdx, row, col) entries that would otherwise cause incorrect tile bitmasks.
Handle edge cases: single-list JaggedTensors with empty jidx, correct jlidx passthrough for ldim() consistency, and reducing numContributingGaussians to unique-pixel space for the contributing IDs kernel.
Add comprehensive C++ unit tests for deduplicatePixels (20 tests covering empty, single, all-unique, duplicates, multi-batch, round-trip reconstruction, and per-batch offset validation) and Python end-to-end tests for all sparse render APIs with duplicate pixels (6 tests covering depths, images, backward gradients, num_contributing, contributing_ids, and multi-camera).

Fixes #106

Test plan

C++ DeduplicatePixelsTest: 20/20 pass
C++ GaussianComputeSparseInfoTest: 14/14 pass
Python TestGaussianRenderSparseDuplicatePixels: 6/6 pass

Deduplicate pixel coordinates at the binding layer before passing them to computeSparseInfo and the rasterization kernels, then scatter results back via index_select. This avoids CUDA kernel changes while correctly handling duplicate (batchIdx, row, col) entries that cause incorrect tile bitmasks. - Add deduplicatePixels() that encodes pixels as int64 keys, sorts to find unique groups, and builds inverse indices for scatter-back - Handle single-list JaggedTensors where jidx() is empty - Pass deduplicated pixels through all sparse render paths (images, depths, num_contributing, contributing_ids) and scatter results back - Reduce numContributingGaussians to unique-pixel space before passing to the contributing IDs kernel - Add C++ unit tests for deduplicatePixels (20 tests) - Add Python end-to-end tests for all sparse render APIs with duplicate pixels (6 tests) Fixes openvdb#106 Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

blackencino

LGTM, Ship it!

Looking specifically at the actual de-duplication, every part of what exists in the C++ there could have been done as a torch python operation. Is there a specific reason this needed to be compiled?

harrism · 2026-03-03T20:48:10Z

Looking specifically at the actual de-duplication, every part of what exists in the C++ there could have been done as a torch python operation. Is there a specific reason this needed to be compiled?

No reason other than that the current implementation is all C++. The original proposal was a much more complicated approach that used custom kernels. Performance may be lower in Python, naturally, but I don't think this operation will be a bottleneck.

swahtz

Looks great I just had a few potential optimization suggestions since these will get called on every rendering iteration.

I was also wondering for the unit tests if it would be worth having a test where all pixels in each batch are duplicates or perhaps that's redundant with the tests were just some pixels are duplicates.

Use torch::bincount instead of zeros+scatter_add_ for counting unique pixels per batch, and reuse uniqueBatchIdx directly as the new jidx instead of round-tripping through jidx_from_joffsets. Co-authored-by: swahtz <2375296+swahtz@users.noreply.github.com> Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

cumsum on a bool tensor already returns int64, so the explicit cast allocated an unnecessary temporary. Also update the docstring to describe the actual sort-based dedup rather than torch::unique. Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

Test the maximum-compression case where every batch collapses to a single unique pixel, verifying offsets and inverse indices are correct. Co-authored-by: swahtz <2375296+swahtz@users.noreply.github.com> Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

harrism · 2026-03-03T22:59:49Z

Great suggestions @swahtz . Implemented your two optimizations and found one other and a stale comment. Also added your suggested maximum compression test case.

…icate-pixels-in-sparse-pixel-

…plicate-pixels-in-sparse-pixel-

…plicate-pixels-in-sparse-pixel- Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor # Conflicts: # tests/unit/test_gaussian_splat_3d.py

fwilliams

A few small notes on the algorithm. I think this could be tightened up to use fewer allocations. This could also be implemented with fixed up front memory using a couple of cuda kernels if you wanted to do it that way. I'll defer to you as to which you think is better.

harrism · 2026-03-04T02:56:50Z

I decided against the custom kernel approach because by definition these are low-cost operations in terms of memory and computation. For sparse rendering, N (number of pixels) should be small -- say 5-50K. I estimate memory savings of a custom kernel to be at most ~1MB at that scale. Even if we render at the scale of 500K pixels (25% of a 1080p image) we're talking only 10MB savings. On the scale of 3DGS memory usage, this is peanuts.

The main reason for custom kernels here would be launch overhead and possibly synchronization reduction. But I don't think that should be addressed until it is determined to be a bottleneck.

- Skip batch-index zeros tensor for single-list JaggedTensors by branching the key computation (avoids N*8 byte allocation) - Use in-place cumsum_(0).sub_(1) for group ID assignment, computing firstInSorted before the mutation (avoids one N*8 byte temporary) Co-authored-by: Francis Williams <francisw@nvidia.com> Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

harrism · 2026-03-04T03:39:44Z

@fwilliams implemented the small optimizations you suggested. Torch cumsum_ doesn't work on bool tensors in CUDA, so had to cast it, so the memory savings is reduced a bit but there's still about 8N bytes savings (1 fewer allocation) for that one (rather than 16N).

…plicate-pixels-in-sparse-pixel-

harrism requested a review from a team as a code owner March 3, 2026 05:09

harrism requested review from matthewdcong and swahtz March 3, 2026 05:09

harrism added new feature New feature or request Gaussian Splatting Issues related to Gaussian splattng in the core library labels Mar 3, 2026

harrism requested a review from fwilliams March 3, 2026 05:09

harrism added this to fvdb-realitycapture Mar 3, 2026

Apply clang-format to C++ sources

a7c9b5f

Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor

blackencino approved these changes Mar 3, 2026

View reviewed changes

swahtz reviewed Mar 3, 2026

View reviewed changes

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

swahtz reviewed Mar 3, 2026

View reviewed changes

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

swahtz reviewed Mar 3, 2026

View reviewed changes

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

swahtz approved these changes Mar 3, 2026

View reviewed changes

harrism and others added 3 commits March 4, 2026 09:44

harrism added 3 commits March 4, 2026 10:50

Merge remote-tracking branch 'origin/main' into issue-106-handle-dupl…

c60d199

…icate-pixels-in-sparse-pixel-

Merge remote-tracking branch 'upstream/main' into issue-106-handle-du…

b8edcbd

…plicate-pixels-in-sparse-pixel-

Merge remote-tracking branch 'upstream/main' into issue-106-handle-du…

8962935

…plicate-pixels-in-sparse-pixel- Signed-off-by: Mark Harris <mharris@nvidia.com> Made-with: Cursor # Conflicts: # tests/unit/test_gaussian_splat_3d.py

fwilliams reviewed Mar 4, 2026

View reviewed changes

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

fwilliams reviewed Mar 4, 2026

View reviewed changes

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

fwilliams reviewed Mar 4, 2026

View reviewed changes

fwilliams approved these changes Mar 4, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into issue-106-handle-du…

ed37648

…plicate-pixels-in-sparse-pixel-

harrism merged commit 110bef3 into openvdb:main Mar 4, 2026
35 checks passed

github-project-automation Bot moved this to Done in fvdb-realitycapture Mar 4, 2026

harrism deleted the issue-106-handle-duplicate-pixels-in-sparse-pixel- branch March 4, 2026 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle duplicate pixels in sparse pixel gaussian rendering#488

Handle duplicate pixels in sparse pixel gaussian rendering#488
harrism merged 10 commits into
openvdb:mainfrom
harrism:issue-106-handle-duplicate-pixels-in-sparse-pixel-

harrism commented Mar 3, 2026 •

edited

Loading

Uh oh!

blackencino left a comment

Uh oh!

harrism commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swahtz left a comment

Uh oh!

harrism commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

fwilliams left a comment

Uh oh!

harrism commented Mar 4, 2026 •

edited

Loading

Uh oh!

harrism commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

harrism commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

blackencino left a comment

Choose a reason for hiding this comment

Uh oh!

harrism commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swahtz left a comment

Choose a reason for hiding this comment

Uh oh!

harrism commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

fwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

harrism commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harrism commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

harrism commented Mar 3, 2026 •

edited

Loading

harrism commented Mar 4, 2026 •

edited

Loading

harrism commented Mar 4, 2026 •

edited

Loading