ProjectedGaussianSplats opacities uses expand/view and accessors instead of per-camera copies#451
Conversation
…not using antialiasing) Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce redundant per-camera opacity tensor copies for ProjectedGaussianSplats when antialiasing is disabled by storing opacities as [N] and lazily expanding to [C, N] via an accessor.
Changes:
- Updated
ProjectedGaussianSplats::opacities()to expand[N] -> [C, N]viaunsqueeze().expand(). - Changed projection codepaths to store opacities as
[N]whenantialias == false, and updated rasterization call sites to use theopacities()accessor instead of the raw member tensor.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
src/fvdb/GaussianSplat3d.h |
Adds an opacities() accessor that expands [N] opacities to a [C, N] view. |
src/fvdb/GaussianSplat3d.cpp |
Stores opacities as [N] in non-antialias projection results and switches rasterization call sites to use the accessor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ward/Backward to calculate the data size of the expanded view tensors Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
…es() Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
harrism
left a comment
There was a problem hiding this comment.
A question and a comment.
harrism
left a comment
There was a problem hiding this comment.
Thanks for correcting my confusion and adding explanation.
…ing implementations (#600) Currently, we unsqueeze and expand the opacities (following the optimization in #451) creating a non-contiguous view. However, the image space mGPU rasterization and world space rasterization implementations require contiguous expanded opacities. This PR changes opacities to always be expanded for consistent behavior across multiple splatting implementations. Note that we can't simply call `contiguous()` in the specific operator impl because the shared autograd code caches the opacities for the backwards pass. This undoes the optimization in #451 and results in a small (~0.5%) decrease in performance in the single GPU case. However, this is outweighed by the corresponding speedups in the mGPU and world space case due to one fewer allocation and more efficient paging. --------- Signed-off-by: Matthew Cong <mcong@nvidia.com>
Addresses a FIXME of using a view for projected gaussian opacities (when not using antialiasing). Accessors were already being used in the kernels so this was just a matter of updating the
ProjectedGaussianSplats.opacities()method and using this access method wheneverperGaussianOpacitieswas used. This reduces the redundant per-camera copies of opacities being made when antialiasing is not enabled.Because of this view that is created for projected gaussian opacities to make them per-camera quantities, I changed the way the number of bytes we prefetch is calculated when we call
memPrefetchAsyncbefore forward/backward rasterization. These calculations are now robust to this change for all the tensors so we get the correct number of bytes if the tensor is a non-contiguous expanded view or if it's a sliced tensor with a non-zero storage offset (i.e.t[4:])