Skip to content

Handle duplicate pixels in sparse pixel gaussian rendering#488

Merged
harrism merged 10 commits into
openvdb:mainfrom
harrism:issue-106-handle-duplicate-pixels-in-sparse-pixel-
Mar 4, 2026
Merged

Handle duplicate pixels in sparse pixel gaussian rendering#488
harrism merged 10 commits into
openvdb:mainfrom
harrism:issue-106-handle-duplicate-pixels-in-sparse-pixel-

Conversation

@harrism

@harrism harrism commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Deduplicate pixel coordinates at the binding layer (deduplicatePixels) before passing to computeSparseInfo and rasterization kernels, then scatter results back via index_select. This avoids any CUDA kernel modifications while correctly handling duplicate (batchIdx, row, col) entries that would otherwise cause incorrect tile bitmasks.
  • Handle edge cases: single-list JaggedTensors with empty jidx, correct jlidx passthrough for ldim() consistency, and reducing numContributingGaussians to unique-pixel space for the contributing IDs kernel.
  • Add comprehensive C++ unit tests for deduplicatePixels (20 tests covering empty, single, all-unique, duplicates, multi-batch, round-trip reconstruction, and per-batch offset validation) and Python end-to-end tests for all sparse render APIs with duplicate pixels (6 tests covering depths, images, backward gradients, num_contributing, contributing_ids, and multi-camera).

Fixes #106

Test plan

  • C++ DeduplicatePixelsTest: 20/20 pass
  • C++ GaussianComputeSparseInfoTest: 14/14 pass
  • Python TestGaussianRenderSparseDuplicatePixels: 6/6 pass

Deduplicate pixel coordinates at the binding layer before passing them
to computeSparseInfo and the rasterization kernels, then scatter results
back via index_select. This avoids CUDA kernel changes while correctly
handling duplicate (batchIdx, row, col) entries that cause incorrect
tile bitmasks.

- Add deduplicatePixels() that encodes pixels as int64 keys, sorts to
  find unique groups, and builds inverse indices for scatter-back
- Handle single-list JaggedTensors where jidx() is empty
- Pass deduplicated pixels through all sparse render paths (images,
  depths, num_contributing, contributing_ids) and scatter results back
- Reduce numContributingGaussians to unique-pixel space before passing
  to the contributing IDs kernel
- Add C++ unit tests for deduplicatePixels (20 tests)
- Add Python end-to-end tests for all sparse render APIs with
  duplicate pixels (6 tests)

Fixes openvdb#106

Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor
@harrism harrism requested a review from a team as a code owner March 3, 2026 05:09
@harrism harrism requested review from matthewdcong and swahtz March 3, 2026 05:09
@harrism harrism added new feature New feature or request Gaussian Splatting Issues related to Gaussian splattng in the core library labels Mar 3, 2026
@harrism harrism requested a review from fwilliams March 3, 2026 05:09
Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor

@blackencino blackencino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Ship it!

Looking specifically at the actual de-duplication, every part of what exists in the C++ there could have been done as a torch python operation. Is there a specific reason this needed to be compiled?

@harrism

harrism commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

Looking specifically at the actual de-duplication, every part of what exists in the C++ there could have been done as a torch python operation. Is there a specific reason this needed to be compiled?

No reason other than that the current implementation is all C++. The original proposal was a much more complicated approach that used custom kernels. Performance may be lower in Python, naturally, but I don't think this operation will be a bottleneck.

Comment thread src/fvdb/GaussianSplat3d.cpp Outdated
Comment thread src/fvdb/GaussianSplat3d.cpp Outdated
Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

@swahtz swahtz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great I just had a few potential optimization suggestions since these will get called on every rendering iteration.

I was also wondering for the unit tests if it would be worth having a test where all pixels in each batch are duplicates or perhaps that's redundant with the tests were just some pixels are duplicates.

harrism and others added 3 commits March 4, 2026 09:44
Use torch::bincount instead of zeros+scatter_add_ for counting unique
pixels per batch, and reuse uniqueBatchIdx directly as the new jidx
instead of round-tripping through jidx_from_joffsets.

Co-authored-by: swahtz <2375296+swahtz@users.noreply.github.com>
Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor
cumsum on a bool tensor already returns int64, so the explicit cast
allocated an unnecessary temporary. Also update the docstring to
describe the actual sort-based dedup rather than torch::unique.

Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor
Test the maximum-compression case where every batch collapses to a
single unique pixel, verifying offsets and inverse indices are correct.

Co-authored-by: swahtz <2375296+swahtz@users.noreply.github.com>
Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor
@harrism

harrism commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

Great suggestions @swahtz . Implemented your two optimizations and found one other and a stale comment. Also added your suggested maximum compression test case.

harrism added 3 commits March 4, 2026 10:50
…plicate-pixels-in-sparse-pixel-

Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor

# Conflicts:
#	tests/unit/test_gaussian_splat_3d.py
Comment thread src/fvdb/GaussianSplat3d.cpp Outdated
Comment thread src/fvdb/GaussianSplat3d.cpp Outdated

@fwilliams fwilliams left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small notes on the algorithm. I think this could be tightened up to use fewer allocations. This could also be implemented with fixed up front memory using a couple of cuda kernels if you wanted to do it that way. I'll defer to you as to which you think is better.

@harrism

harrism commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

I decided against the custom kernel approach because by definition these are low-cost operations in terms of memory and computation. For sparse rendering, N (number of pixels) should be small -- say 5-50K. I estimate memory savings of a custom kernel to be at most ~1MB at that scale. Even if we render at the scale of 500K pixels (25% of a 1080p image) we're talking only 10MB savings. On the scale of 3DGS memory usage, this is peanuts.

The main reason for custom kernels here would be launch overhead and possibly synchronization reduction. But I don't think that should be addressed until it is determined to be a bottleneck.

- Skip batch-index zeros tensor for single-list JaggedTensors by
  branching the key computation (avoids N*8 byte allocation)
- Use in-place cumsum_(0).sub_(1) for group ID assignment, computing
  firstInSorted before the mutation (avoids one N*8 byte temporary)

Co-authored-by: Francis Williams <francisw@nvidia.com>
Signed-off-by: Mark Harris <mharris@nvidia.com>
Made-with: Cursor
@harrism

harrism commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

@fwilliams implemented the small optimizations you suggested. Torch cumsum_ doesn't work on bool tensors in CUDA, so had to cast it, so the memory savings is reduced a bit but there's still about 8N bytes savings (1 fewer allocation) for that one (rather than 16N).

@harrism harrism merged commit 110bef3 into openvdb:main Mar 4, 2026
35 checks passed
@harrism harrism deleted the issue-106-handle-duplicate-pixels-in-sparse-pixel- branch March 4, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Gaussian Splatting Issues related to Gaussian splattng in the core library new feature New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Handle duplicate pixels in sparse pixel gaussian rendering

4 participants