GaussianProjectionForward: fix camera data loading that exceeds blockDim#345
Merged
Conversation
more accurate shared memory calculation Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a critical bug in the Gaussian projection forward operation where camera data loading would fail when the number of cameras exceeded approximately 13 (256 threads / 21 elements per camera). The fix implements a strided loop pattern to ensure all camera data is loaded into shared memory regardless of the number of cameras, and improves the safety of shared memory calculations by using sizeof() operators.
- Replaced single-pass conditional loading with strided loop to handle camera counts exceeding block dimensions
- Changed shared memory type from
T[]tochar[]and updated pointer arithmetic to usesizeof()for type-safe offset calculations - Made shared memory size calculations more explicit and type-safe using
sizeof(nanovdb::math::Mat3<scalar_t>)andsizeof(nanovdb::math::Vec3<scalar_t>)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
harrism
approved these changes
Nov 19, 2025
harrism
left a comment
Contributor
There was a problem hiding this comment.
Nice fix! Just think maybe the shared declaration should be aligned as in your other recent PR.
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In
loadCamerasIntoSharedMemory, each thread loaded an element of camera data into shared memory. However, if the amount of camera data exceeds the number of threads, all of the camera data would fail to be loaded. This happens around camera 13 (256 / 21 elements-per-camera). Changed this loading into a strided loop so all camera data will be loaded.Also I changed the calculation of shared memory from a manual calculation to using
sizeof()the appropriatenanovdb::mathstructs for safety's sake.