Skip to content

Optimize joffsets construction via pinned memory#403

Merged
matthewdcong merged 1 commit into
openvdb:mainfrom
matthewdcong:jagged_pinned_memory
Jan 7, 2026
Merged

Optimize joffsets construction via pinned memory#403
matthewdcong merged 1 commit into
openvdb:mainfrom
matthewdcong:jagged_pinned_memory

Conversation

@matthewdcong

@matthewdcong matthewdcong commented Jan 6, 2026

Copy link
Copy Markdown
Contributor

During Gaussian splat training, we construct several JaggedTensor instances where the small joffsets tensor is initialized on the host and then copied over to the device. Page-locking these small tensors on the host enables them to be read at a higher bandwidth by the device which accelerates the subsequent host to device transfer. Furthermore, with the Unified Memory backend, this reduces the amount of host <-> device synchronization required enabling the the subsequent small kernels to be launched with lower latency.

NB: The torch::tensor(...) call with the initialization list performs the initialization on the host before copying the contents to the device.

@matthewdcong matthewdcong requested a review from a team as a code owner January 6, 2026 23:55
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong

matthewdcong commented Jan 7, 2026

Copy link
Copy Markdown
Contributor Author

With Unified Memory, the training runs about 1.06x faster. The benefits for the PyTorch CUDA backend are more marginal because host to device transfers call cudaStreamSynchronize by default and I don't want to make the transfer non-blocking right now (though we could in the future) in case something else in the CUDA backend is relying on the sync.

@swahtz swahtz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good catch. I wonder if there are other situations with similar small CPU tensors that need to be created and then moved that we could apply this to.

@matthewdcong matthewdcong merged commit daef02e into openvdb:main Jan 7, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants