LangSplatv2: Implement mask + CLIP feature transforms by swahtz · Pull Request #42 · openvdb/fvdb-examples

swahtz · 2026-01-28T20:02:00Z

Implements the preprocessing pipeline to generate the SAM2 masks and CLIP feature generation as transforms in the same style as garfvdb's implementation.
closes #31

Implement sam2 and clip data transforms that work in a pipeline together to produce the features we need for langsplatv2 closes openvdb#31 Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Adds a LangSplatV2-style preprocessing pipeline by introducing scene transforms to generate multi-scale SAM2 masks and compute CLIP features for masked regions, plus configuration and packaging scaffolding.

Changes:

Introduces ComputeMultiScaleSAM2Masks to generate and cache multi-scale SAM2 masks with NMS post-processing.
Introduces ComputeCLIPFeatures to encode masked regions with OpenCLIP and cache features + per-scale segmentation maps.
Adds LangSplatV2 preprocessing config dataclasses and a new Python package definition.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
open_vocabulary_segmentation/langsplatv2/pyproject.toml	Adds packaging metadata and dependencies for the new langsplatv2 module.
open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/multi_scale_sam_masks.py	New transform to compute + cache multi-scale SAM2 masks and apply mask NMS.
open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/clip_feature_encoding.py	New transform to compute + cache CLIP features for masked regions and build segmentation index maps.
open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/init.py	Exposes the new transforms at the package level.
open_vocabulary_segmentation/langsplatv2/langsplatv2/config.py	Adds pipeline configuration and a helper to assemble the transform sequence.
instance_segmentation/garfvdb/garfvdb/util.py	Refactors RGB→SH conversion to use module-level constants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/clip_feature_encoding.py

open_vocabulary_segmentation/langsplatv2/pyproject.toml

open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/multi_scale_sam_masks.py

open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/clip_feature_encoding.py

open_vocabulary_segmentation/langsplatv2/langsplatv2/config.py

open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_transforms/multi_scale_sam_masks.py

open_vocabulary_segmentation/langsplatv2/pyproject.toml

…nsforms/clip_feature_encoding.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

…nsforms/clip_feature_encoding.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

## Summary Implements the LangSplatV2 training pipeline for learning per-Gaussian sparse language feature fields using fVDB and fvdb-reality-capture. This builds on the preprocessing transforms already merged in #42, adding the model, loss, training loop, and supporting utilities needed to train language-aware Gaussian splats. closes #32 Key components: - **Model** (`model.py`): `LangSplatV2Model` wraps a frozen `GaussianSplat3d` with learnable per-Gaussian logits and codebooks. Renders sparse coefficient weight maps via splatting and decodes them into dense CLIP feature maps through codebook lookup. - **Vector quantization** (`vq_utils.py`): Implements `softmax_to_topk_soft_code` for efficient sparse coefficient generation and `ResidualVectorQuantization` for K-means codebook initialization from ground-truth CLIP features. - **Loss** (`loss.py`): Cosine similarity and L1 losses with per-pixel masking for regions without valid language features. - **Dataset** (`training/dataset.py`): `LangSplatV2Dataset` loads pre-computed CLIP features and segmentation maps in compact form. Dense ground-truth feature maps are materialized on-device after transfer using `build_feature_map`, avoiding large CPU-to-GPU transfers. - **Training runner** (`training/trainer.py`): `LangSplatV2Training` handles the full workflow — dataset construction, K-means codebook initialization, optimizer setup, training/eval loops with gradient accumulation, and checkpointing. - **Config** (`config.py`): Extended with `LangSplatV2ModelConfig` and `LangSplatV2TrainingConfig` dataclasses. - **Entry point** (`train_langsplatv2.py`): CLI script using `tyro` for launching training. ### Performance optimizations - Compact feature storage with `JaggedTensor` for variable-length per-image features, avoiding padding overhead - GPU-side dense feature map construction (`build_feature_map`) using `torch.empty` to eliminate costly zero-fill of ~4 GB tensors ## Test plan - [x] Run training on a preprocessed scene: `python train_langsplatv2.py --scene-dir <path> --checkpoint-dir <path>` - [x] Verify K-means codebook initialization completes and logs cluster info - [x] Confirm training loop progresses without OOM on a single GPU (tested with 1080p images) - [x] Check that checkpoints are saved and can be resumed - [x] Profile with Nsight Systems to verify no unexpected data transfer bottlenecks --------- Signed-off-by: Jonathan Swartz <jonathan@jswartz.info> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

swahtz added 3 commits January 28, 2026 21:01

Add langsplatv2

a35d36d

Implement sam2 and clip data transforms that work in a pipeline together to produce the features we need for langsplatv2 closes openvdb#31 Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

cleanup

5341f61

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

small rgb_to_sh opt

52282ff

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz requested a review from Copilot January 28, 2026 20:02

Copilot started reviewing on behalf of swahtz January 28, 2026 20:02 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

swahtz and others added 4 commits January 29, 2026 09:21

Update open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_tra…

e24d214

…nsforms/clip_feature_encoding.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

readme

fac31e1

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Update open_vocabulary_segmentation/langsplatv2/langsplatv2/scene_tra…

d4bdc4b

…nsforms/clip_feature_encoding.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

addressing review notes

b980b27

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz merged commit bdd5df0 into openvdb:main Feb 9, 2026
8 checks passed

swahtz deleted the langsplatv2_feature_transforms branch February 9, 2026 04:11

swahtz mentioned this pull request Feb 12, 2026

LangSplatV2: Training pipeline for sparse language feature fields #53

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LangSplatv2: Implement mask + CLIP feature transforms#42

LangSplatv2: Implement mask + CLIP feature transforms#42
swahtz merged 7 commits intoopenvdb:mainfrom
swahtz:langsplatv2_feature_transforms

swahtz commented Jan 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

swahtz commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

swahtz commented Jan 28, 2026 •

edited

Loading