Optimize joffsets construction via pinned memory by matthewdcong · Pull Request #403 · openvdb/fvdb-core

matthewdcong · 2026-01-06T23:55:46Z

During Gaussian splat training, we construct several JaggedTensor instances where the small joffsets tensor is initialized on the host and then copied over to the device. Page-locking these small tensors on the host enables them to be read at a higher bandwidth by the device which accelerates the subsequent host to device transfer. Furthermore, with the Unified Memory backend, this reduces the amount of host <-> device synchronization required enabling the the subsequent small kernels to be launched with lower latency.

NB: The torch::tensor(...) call with the initialization list performs the initialization on the host before copying the contents to the device.

Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong · 2026-01-07T00:23:21Z

With Unified Memory, the training runs about 1.06x faster. The benefits for the PyTorch CUDA backend are more marginal because host to device transfers call cudaStreamSynchronize by default and I don't want to make the transfer non-blocking right now (though we could in the future) in case something else in the CUDA backend is relying on the sync.

swahtz

Nice, good catch. I wonder if there are other situations with similar small CPU tensors that need to be created and then moved that we could apply this to.

matthewdcong requested a review from a team as a code owner January 6, 2026 23:55

matthewdcong requested review from blackencino and swahtz January 6, 2026 23:55

Optimize joffsets construction via pinned memory

f717775

Signed-off-by: Matthew Cong <mcong@nvidia.com>

matthewdcong force-pushed the jagged_pinned_memory branch from 1a68226 to f717775 Compare January 6, 2026 23:57

swahtz approved these changes Jan 7, 2026

View reviewed changes

matthewdcong merged commit daef02e into openvdb:main Jan 7, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize joffsets construction via pinned memory#403

Optimize joffsets construction via pinned memory#403
matthewdcong merged 1 commit into
openvdb:mainfrom
matthewdcong:jagged_pinned_memory

matthewdcong commented Jan 6, 2026 •

edited

Loading

Uh oh!

matthewdcong commented Jan 7, 2026 •

edited

Loading

Uh oh!

swahtz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

matthewdcong commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewdcong commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swahtz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthewdcong commented Jan 6, 2026 •

edited

Loading

matthewdcong commented Jan 7, 2026 •

edited

Loading