What's Changed
- Bump verion to 0.3.1 by @swahtz in #303
- shared memory types in IntegrateTSDF kernel by @swahtz in #307
- Viewer now supports multiple scenes by @phapalova in #308
- Update nanovdb_editor version. by @areidmeyer in #309
- Implementing morton and hilbert for Grid and GridBatch ijk by @blackencino in #311
- Google analytics in docs by @fwilliams in #312
- minor edits by @kmuseth in #305
- Fix radix sort prefetch for disjoint sorts by @matthewdcong in #315
- Rename _Cpp python extension library binary to _fvdb_cpp, include pybind11 headers first by @harrism in #313
- Morton/Hilbert fixes by @blackencino in #316
- Active grid coords cleanup by @blackencino in #318
- Use smaller runtime images for non-build CI steps by @matthewdcong in #320
- Make nightly workflows not run on forks of openvdb/fvdb-core by @harrism in #319
- Changed the _Cpp.pyi filename to _fvdb_cpp.pyi by @blackencino in #322
- Convolution default fixed, extensive tests by @blackencino in #321
- Morton/Hilbert module level standalone functions by @blackencino in #323
- conda environment files: switch to conda-forge torchvision by @swahtz in #327
- Fix documentation for setting TORCH_CUDA_ARCH_LIST by @matthewdcong in #328
- Eliminate snake_case in GaussianTileIntersection.cu by @harrism in #330
- Fix GHA error in (currently unused) nightly workflows by @harrism in #329
- Disable running tests for Draft PRs by @swahtz in #339
build.shdebug build fixes by @swahtz in #338- Add viz bindings for wait and add_image by @phapalova in #332
- Render all contributing gaussian IDs/weights by @swahtz in #340
- Fix Gaussian rasterization shared memory alignment by @swahtz in #342
- [Bug Fix] Improve binary search handling in JIdxForJOffsets by @iYuqinL in #325
- Debug build gtest fix by @swahtz in #344
- GaussianProjectionForward: fix camera data loading that exceeds blockDim by @swahtz in #345
- Support backgrounds in Gaussian Rasterization by @harrism in #343
- Remove tests.yml 'push' trigger by @swahtz in #347
- Temporarily disable nightly benchmark/test job by @swahtz in #346
- Fix potential oversubscription of nvcc * cmake parallel threads by @swahtz in #351
- Hot fixes for Jcat errors by @blackencino in #352
- Fix inverted logic on computing abs(gradient) in backward Gaussian rasterization by @harrism in #355
- A pedantic pull request by @swahtz in #359
- Fix
JaggedTensor.from_*_and_list_idsldim=2 issue by @swahtz in #357 - Gradients added to Convolution Ground Truth Unit Tests by @blackencino in #358
JaggedTensor::unbind*: Reduce number of blocking GPU->CPU copies by reusing cached lsizes by @swahtz in #360- Use
scaled_dot_product_attentionoperator from Torch, remove our SDPA by @swahtz in #364 - Added backwards/gradient tests to the default convolution unit tests. by @blackencino in #361
- Add lineinfo option to build by @harrism in #367
- Create and use getMaxSharedMemory utility by @harrism in #368
- Remove unnecessary stream sync in GaussianTileIntersection by @harrism in #370
- plumb sparse rendering functions by @fwilliams in #348
- MCMC gaussian splatting relocation kernel and unit tests by @harrism in #374
- MCMC add noise kernel and gtests by @harrism in #377
- Bindings and Pytests for MCMC functions by @harrism in #394
- Expose min_opacity parameter in Gaussian MCMC relocate function. by @harrism in #396
- Expose k and t parameters in MCMC relocation by @fwilliams in #402
- Optimize joffsets construction via pinned memory by @matthewdcong in #403
- NVIDIA Branding in docs by @fwilliams in #405
- Fix hardcoded float dtype by @matthewdcong in #406
- Prefetch fused SSIM outputs to avoid write page faults by @matthewdcong in #407
- Fix the docstrings of
fvdb.viz.Scene.camera_orbit_directionby @swahtz in #398 - Fix NaNs in
rasterizeTopContributingGaussianIdsForwardweights by @swahtz in #400 - Plumb Sparse Gaussian TileIntersection by @swahtz in #401
- Fix/improve radix sort synchronization by @matthewdcong in #409
- Fix the derivation of the number of cameras in the rasterization kernels that would be incorrect when in packed mode. by @swahtz in #412
- CMake: create nanovdb_editor_BINARY_DIR before using it as a working directory by @harrism in #416
- Fix crash loading GaussianPly files to CPU device by @swahtz in #417
- Switch radix sort merge from host to device side synchronization by @matthewdcong in #415
- Pin PyTorch version in CI to 2.9.1 by @matthewdcong in #424
computeSparseInfosmall optimization by @swahtz in #428- Fix semantics of GaussianTileIntersection's torch::cumsum by @swahtz in #427
- Fix version macro and PyTorch 2.10 build error by @matthewdcong in #423
- Multi-Axis Dispatch Framework by @blackencino in #418
- SampleGridTrilinear: Vectorized float4 loads by @swahtz in #430
- Implement PrivateUse1 (multiGPU) support for MCMC kernels by @harrism in #421
- Rasterize contributing Gaussian ID kernels optimizations by @swahtz in #429
- Expose
evaluate_spherical_harmonicsin Python bindings by @swahtz in #431 - Fix:
GaussianSphericalHarmonicsBackwardsrace condition for cameras/batch-size > 1 by @swahtz in #434 - Update CUDA 13 nightly test image by @matthewdcong in #438
- Disable two GCC 13.3 warnings by @matthewdcong in #439
- Improve PyTorch build configuration time by @matthewdcong in #441
- Switch to device-centric synchronization for forEach mGPU by @matthewdcong in #440
- Fix chain rule for log_scale gradient in projection backward pass by @harrism in #433
- Gaussian Projection with the Unscented Transform by @fwilliams in #420
- Build speedups, added trace and optional pip forced install by @blackencino in #443
- Fix:
GaussianProjectionBackwarddLossDQuat missing warpSum by @swahtz in #435 - Optimize Gaussian tile intersection for mGPU by @matthewdcong in #446
- dispatch framework
for_each, views, and tag canonicalization by @blackencino in #452 - Fix:
ProjectionForwardinitializes accessors in both the initializer list and constructor body by @swahtz in #450 ProjectedGaussianSplatsopacities uses expand/view and accessors instead of per-camera copies by @swahtz in #451- Remove unused sparse convolution backends by @blackencino in #454
- Add developer worktree tools for parallel development workflows by @harrism in #445
- Add AGENTS.md for AI agent guidance by @harrism in #455
- Fix fvdb-issue clipboard crash in SSH sessions by @harrism in #461
- Fix CI checks showing 'waiting for status' when paths-ignore skips workflows by @swahtz in #462
- Fix JaggedTensor single-element constructor unconditionally initializing CUDA via pinned_memory by @swahtz in #468
- CI: Skip stopping runners if starting runners was skipped by @harrism in #471
- Skip stopping runners that were skipped for CUDA 12.8 and 13.0 tests. by @harrism in #472
- Add no-argument interactive mode to fvdb-issue by @harrism in #466
- Sparse Conv Default full feature support by @blackencino in #473
- Rasterization using 3d gaussians by @fwilliams in #444
- Update nanovdb to version 32.9.0 and refine grid type checks by @swahtz in #475
- Nightly build and publish action by @swahtz in #477
- Nightly publish fix for non-existent nightly packages by @swahtz in #478
- Updated nightly build install docs by @swahtz in #481
- Downgrade nano to 32.8.0 by @swahtz in #482
- Fix multibatch mGPU race condition in SH backwards op by @matthewdcong in #484
- Upgrade openvdb git tag for NanoVDB 32.9.0 by @swahtz in #483
- Fix datatype in backward projection test by @matthewdcong in #486
- Switch from inverse to linalg_inv_ex to avoid sync by @matthewdcong in #487
- Refactor CameraIntrinsics constructor and add missing include by @blackencino in #489
SampleGridTrilinearoptimization: stencil + sample by @swahtz in #474- Refactor Gaussian rendering to use composable camera model for projection and ray generation by @fwilliams in #485
- Add
maskssupport to all Gaussian render methods by @swahtz in #480 - Update NanoVDB to v32.9.1 by @swahtz in #493
- Feature/op consolidation by @blackencino in #492
- SaveNanoVDB: Fix voxel size/origin metadata on serialized index grids by @swahtz in #490
- viewer fix for notebook by @zlalena in #350
- Disable unreliable rasterization deadlock test by @fwilliams in #498
- SimpleUnet bug fixes by @swahtz in #496
- Added the github CLI to the conda dev environment by @blackencino in #500
- Handle duplicate pixels in sparse pixel gaussian rendering by @harrism in #488
- Add bfloat16 support to JaggedTensor reduce operators by @swahtz in #501
- Add seed initialization in TestSimpleUNet class by @swahtz in #502
- Add unit tests for
fvdb.nnmodules by @swahtz in #497 - Pin CI checkout refs to immutable commit SHAs to fix build/test skew by @swahtz in #503
- Docs, examples, notebooks udpates by @swahtz in #504
- Version class by @swahtz in #507
- Improve/optimize mGPU scaling via batched prefetching and sorting changes by @matthewdcong in #499
- Switch 32-bit tensor index accessors to 64-bit across all ops by @harrism in #505
- Fix CCCL version check macro by @matthewdcong in #509
- Add devtools script to report unanswered external issues by @harrism in #510
- Fix flaky test_jsum_list_of_lists bfloat16 test by @fwilliams in #517
- Update CONDA_OVERRIDE_CUDA to 13.0 by @swahtz in #519
- Add Slack output format and daily CI workflow for unanswered issues report by @harrism in #513
- Fix insider issues filtering for CI unanswered issues script by @harrism in #522
- Release Process Updates by @swahtz in #525
- Update CI workflow to include git installation in system dependencies by @swahtz in #526
- Add release branching docs and automation scripts by @harrism in #512
- Fix smoke test Python setup and open release PR as draft by @harrism in #528
- Make start-release.sh idempotent for safe re-runs by @harrism in #529
- Run unit tests only for matching test_environment.yml config by @harrism in #531
- Restore fix for dLossDQuat missing warpSum by @matthewdcong in #533
- Fix quaternion gradient accumulation in GaussianProjectionJaggedBackward by @swahtz in #534
- Fix publish workflow: Rocky Linux 8 containers and single unit test job by @harrism in #536
- Fix
publish.ymlpython install action by @swahtz in #537 - Update publish.yml to include additional system dependencies by @swahtz in #538
- Align publish.yml build containers with tests.yml (Rocky Linux 8 / manylinux_2_28) by @swahtz in #540
- publish: dual S3 + PyPI publish on release (with GPU tests) by @swahtz in #545
- Improve mGPU partitioning for Gaussian projection operators by @matthewdcong in #547
- Improve mGPU partitioning for SH operators by @matthewdcong in #546
nightly-publishfix errors for missing tools by @swahtz in #549- Check for zero intersection case in tile intersection prefetch by @matthewdcong in #553
- Add automated doc version updates to release scripts by @harrism in #552
finish-release-process.shupdates to preserverelease/*branch integrity by @swahtz in #544- Update NanoVDB Editor to latest. by @areidmeyer in #556
- Add CHANGES.md by @swahtz in #539
- Upgrade conda env files to gcc/gxx 14.3 by @swahtz in #557
- Replace Slack issue report with event-driven issue triage labels by @harrism in #551
- Added
nanovdb-editoras an optional dependency by @swahtz in #559 - 0.4.2 Changelog and Docs update by @swahtz in #565
- Docs deployment workflow fix by @swahtz in #566
- Fuse computeGradientState into projectionBackwardsKernel by @matthewdcong in #560
- PyTorch 2.11 support for venv CI by @matthewdcong in #561
- Reduce shared memory usage in pinhole projection by re-arranging blocks by @matthewdcong in #555
- Centralize Github workflow and doc version configuration into shared config by @swahtz in #569
- Add
camera_fovgetter/setter tofvdb.viz.Sceneby @swahtz in #558 - Remove torchsparse from all environment and CI configurations by @swahtz in #572
- Add cache clearing step in nightly publish workflow by @swahtz in #574
- Remove torch_scatter dependency in favor of built-in PyTorch scatter_reduce_ by @swahtz in #571
- Remove vestigial
setup.pyand GitLab CI config by @swahtz in #570 - Hotfix release process improvements by @swahtz in #563
- Update Gaussian splatting camera API and world-space parity by @fwilliams in #518
- Retire C++ GridBatch wrapper; add functional API and Grid class by @blackencino in #582
scaled_dot_production_attentionsupport for additional Torch backends (Flash Attention) by @swahtz in #365- Fixes code samples in Sphinx docs that should have border by @swahtz in #577
- Reimplement JaggedReduce ops with PyTorch
scatter_reduce_by @swahtz in #578 - Eliminate
.item()synchronization stalls in hot C++ paths by @swahtz in #586 - Pass current CUDA stream to all kernel launches by @swahtz in #587
- Add TEACHME interactive lesson documents for fvdb core API by @harrism in #584
- Fix missing CUDA device guards in kernel-launching functions by @swahtz in #589
- [CI] Get nanovdb-editor from pip instead from built whl by @phapalova in #581
- Use pip package of nanovdb-editor by @phapalova in #580
- Upgrade clang-tools to 21 to fix clangd SIGSEGV on CUDA files by @fwilliams in #491
- Move Gaussian splatting autograd and pipeline logic from C++ to Python by @fwilliams in #595
- Fix tutorial docs: move from wip, fix broken APIs, add CI testing by @harrism in #592
- Fix additional sync point introduced in autograd change by @matthewdcong in #599
- Materialize repeated opacities for compatibility with multiple splatting implementations by @matthewdcong in #600
- Restore useful comments from autograd/pipeline refactor by @swahtz in #603
- Fix weighted average in TSDF integration to apply
pixelWeightto new samples by @jinhwanlazy in #588 - Improve tutorial content based on review feedback by @harrism in #598
- Environment/Docs/URL Updates by @swahtz in #605
- Initialize gradient accumulation tensors before UT projection path by @harrism in #608
- Versioned Documentation: Documentation integration with Read the Docs by @swahtz in #610
- CI: GH Actions Version Updates by @swahtz in #611
- Fixes to Sphinx Docs Build by @swahtz in #613
- Documentation: Add pre_build job to Read the Docs configuration for version generation by @swahtz in #615
- Documentation: Fix Read the Docs build and resolve Sphinx warnings by @swahtz in #618
- Refactor Gaussian splatting ops and extract utility functions by @fwilliams in #596
- Implement GitHub Actions workflow for Sphinx documentation build test by @swahtz in #622
- Documentation: Improve version label contrast in sidebar and fix reality-capture URLs by @swahtz in #623
- Documentation: Switch docs redirect to point to main RTD URL by @swahtz in #625
- CI: Revert failing drop cache step by @phapalova in #629
- Fix URLs in the README to point to the ReadtheDocs site by @swahtz in #626
- Promote GridBatchData to public header by @swahtz in #632
- Fix
fvdb.viz.PointCloudViewuse of older API by @swahtz in #631 - Reorder Gaussian2D to improve field alignment by @matthewdcong in #624
- Remove unused SH function by @matthewdcong in #630
- Add CMake installation support for public headers and configuration files by @swahtz in #633
- Fix inject_from CUDA crash when source grid has 0 voxels by @harrism in #616
- Disambiguate CI job names and fix Torch CMake header path by @swahtz in #635
- Add
sample_nearestoperator for GridBatch and Grid by @swahtz in #628 - Generalize volume_render to N channels by @swahtz in #636
- Fix build issue with SampleNearest by @swahtz in #637
- Add
__launch_bounds__to forEach CUDA kernels by @swahtz in #638 - ci(nightly): anchor nightly version to upcoming release in pyproject.toml by @swahtz in #645
- CI: fix nightly wheel build by @phapalova in #634
- Add Vec2 and double fast paths to
SampleGridTrilinearby @swahtz in #639 - Optimizations for
volume_renderand move its autograd layer to Python by @swahtz in #640 - Speed up builds with ccache, host PCH, and trimmed torch headers by @swahtz in #644
- NanoVDB loading: fix mixed grid type loading and add read_metadata API by @swahtz in #641
- [docs] Nightly build version numbering update for installation documentation by @swahtz in #646
- ci(publish): set short cache-control and invalidate CloudFront on index uploads by @swahtz in #647
saveNVDBOptimizations by @swahtz in #650- Shared memory optimizations for Gaussian rasterization by @matthewdcong in #554
- Avoid repeated delta computation by @matthewdcong in #651
- Add warp level early exit for forward rasterization by @matthewdcong in #658
- More optimal prefetching for mGPU Gaussian splatting by @matthewdcong in #657
- Fix Conda build failure by @matthewdcong in #661
- Improve parity with gsplat for dense rasterization by @matthewdcong in #659
- CMake: Link exported Torch target by @swahtz in #662
- fix: silence SyntaxWarning and tensor copy-construct UserWarning in tests by @mvanhorn in #654
- Improve mGPU Gaussian tile intersection by @matthewdcong in #664
ray_implicit_intersectionimprovements by @swahtz in #663- Improve prefetch granularity for rasterization kernels by @matthewdcong in #665
- Upgrade to PyTorch 2.11 by @swahtz in #573
- Add narrow-band SDF reinitialize/retopologize ops by @swahtz in #669
- Update dev_environment.yml to newer openusd version by @zlalena in #667
- CI: Removed the pytorch upper-bound version in
pyproject.tomlby @swahtz in #671 - CI Token Best Practices Sweep by @swahtz in #672
- Workflow Security: scope bundled shellcheck to real issues by @swahtz in #674
- CODEOWNERS: require NVIDIA maintainer review for governance/CI files by @harrism in #676
- CI: correct the change-detection gate so docs-only PRs skip cleanly by @harrism in #677
- v0.5 Release: Update CHANGES.md to reflect contributions since 0.4 by @swahtz in #675
- fix: TensorGrid blind-data lookup uses index 0 instead of loop counter by @mvanhorn in #652
- docs: fix marching_cubes return type (unique vertex indices, not normals) by @mvanhorn in #653
- Add CHANGES.md entries by @swahtz in #679
- CI: bump uv to 0.11.26 so Python 3.14 builds use stable CPython by @swahtz in #681
New Contributors
- @kmuseth made their first contribution in #305
- @iYuqinL made their first contribution in #325
- @zlalena made their first contribution in #350
- @jinhwanlazy made their first contribution in #588
- @mvanhorn made their first contribution in #654
Full Changelog: v0.3.0...v0.5.0