Project page | Paper | Long 360 | VRU Dataset
ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction,
Jie Liang, Jiahao Wu, Chao Wang, Jiayu Yang, Xiaoyun Zheng, Kaiqiang Xiong, Zhanke Wang, Jinbo Yan, FengGao, Ronggang Wang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, Pengcheng Lab, Peking Unviersity
CVPR 2026
This repository is the official implementation of "ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction".
In this paper, we propose a hybrid reconstruction framework, Clip-Stream, which performs stream-level optimization at the clip granularity rather than at the frame level. This design enables scalable and temporally coherent reconstruction of long dynamic sequences, effectively eliminating flickering artifacts.
We tested on a server configured with Ubuntu 20.04, cuda 11.8 and gcc 11.4.0. Other similar configurations should also work, but we have not verified each one individually. In fact, this environment configuration is not strict — any environment that can run 3DGS properly should also be able to run our program. In addition, some extra packages are required, such as Tinycudann.
- Clone this repo:
git clone https://github.com/liangjie1999/ClipGStream --recursive
cd ClipGStream- Install dependencies
# Before using the flowing bash commands, you need to check your CUDA version
# Using 'echo $CUDA_HOME' to check (We use CUDA 11.8)
conda env create --file environment.yml
conda activate ClipGStream
cd submodules/diff-gaussian-rasterization
CC=gcc-11 CXX=g++-11 CUDAHOSTCXX=/usr/bin/g++-11 pip install . --no-build-isolation
cd submodules/simple-knn
CC=gcc-11 CXX=g++-11 CUDAHOSTCXX=/usr/bin/g++-11 pip install . --no-build-isolation
# or you can remove the CC=gcc-11 CXX=g++-11 CUDAHOSTCXX=/usr/bin/g++-11After that, you need to install tiny-cuda-nn (1.7).
# Here we provide bash to install tiny-cuda-nn, if you use different CUDA version with us, may be you need to go to the upper link to install the correct tiny-cuda-nn.
cd ./tiny-cuda-nn-master/bindings/torch
python setup.py installQuickly launch using only a single command.
We provide a tiny dataset (20 frames) for quick demonstration. This dataset includes multi-view images and has been fully preprocessed. We will use frames 0-10 as the Reference Clip and frames 10-20 as the Source Clip.
Download the dataset from our GitHub Releases
long_360_tiny_dataset
|---frame000000
|---frame000001
| |---images
| |---<image 1>
| |---...
|---frame000002
|---...
|---sparse # information of camera
|---plys
|--- 0.ply # point cloud of reference clip
|-- 1.ply # point cloud of source clip (residual point cloud)
You need to first set the source_path (dataset address) in the runTinyDataset.sh file, and then run the following bash.
./scripts/tiny_long_360/run.sh
You can see the results in ./output/tiny_long_360, and the result structure are as follow:
|--- history
| |--- decoder
| |--- mlp_color.pth
| |--- mlp_cov.pth
| |--- mlp_offset.pth
| |--- mlp_opacity.pth
| |--- point
|--- 0.ply
|--- 1.ply
| |--- FDHash_0.pth
| |--- FDHash_1.pth
|--- test
| |--- renders # render images
| |--- videos # render videos
| |--- 0.csv # metrics of test view 0
| |--- 1.csv
| |--- 2.csv
| |--- 3.csv
Here we also show our tiny dataset's quantitative results (in 0.csv, 1.csv...)
| Test View | PSNR | DSSIM1 | DSSIM2 | LPIPS |
|---|---|---|---|---|
| 0 | 22.90 | 0.101 | 0.043 | 0.197 |
| 1 | 23.68 | 0.100 | 0.043 | 0.210 |
| 2 | 24.80 | 0.091 | 0.038 | 0.194 |
| 3 | 22.50 | 0.129 | 0.060 | 0.237 |
| AVG | 23.47 | 0.105 | 0.046 | 0.209 |
Download the dataset from our GitHub Releases
For the Long 360 dataset, we provide a portion of the processed data here, which includes the camera pose of the first frame (frame 0) and the point clouds of all clips: Basketball_gz_cameras_pointcloud.zip. The additional steps you need to carry out are simply running this script Step 1 to convert videos to images. And then running Step 3 and Step 4 to perform undistortion on the remaining frames and correctly configure the data paths.
long_360_tiny_dataset
|---frame000000
|--- sparse # information of camera
|--- images # undistorted image
|---view000.mp4
|---view001.mp4
| ...
|---view035.mp4
|---...
|---sparse
|---plys
|--- 0.ply # point cloud of reference clip
|-- 1.ply # point cloud of source clip 1 (residual point cloud)
| ...
|-- n.ply # point cloud of source clip n
./scripts/long_360/trainReferenceClip.sh
./scripts/long_360/trainSourceClip.sh # if you have multiple GPUs, you can run ./scripts/long_360/parallel_source_clip/generateTrainingCmd.py
./scripts/render.sh # render & metric
| Test View | PSNR | DSSIM1 | DSSIM2 | LPIPS |
|---|---|---|---|---|
| 0 | 24.03 | 0.069 | 0.032 | 0.122 |
| 1 | 24.95 | 0.068 | 0.031 | 0.135 |
| 2 | 26.57 | 0.059 | 0.025 | 0.117 |
| 3 | 22.86 | 0.111 | 0.053 | 0.183 |
| AVG | 24.60 | 0.077 | 0.035 | 0.139 |
We provide a method to process custom data (multi-view video streams) into our dataset format. For details, please refer to data_process/custom_dataset/
Taking training on the tiny_long_360 dataset as an example (./scripts/tiny_long_360/), we divide 20 frames of multi-view images in order into 2 clips, each with a length of 10. The first clip is the Reference Clip, and subsequent clips are Source Clips. Training consists of two steps:
- Train the Reference Clip: The Reference Clip serves as a foundational representation of the scene, which will be inherited by subsequent clips to prevent flickering issues between clips. The following parameters are used:
clip_sizesets the length of a single clip,project_total_framesdefines the total sequence length, andframes_start_endspecifies the start and end frames for the Reference Clip.
CUDA_VISIBLE_DEVICES=1 trainReferenceClip.py --project_total_frames 20 --clip_size 10 --iterations 5000 -s "/data8/dataset/longvideos/jpg/long_360_tiny_dataset/" -m ./output/tiny_long_360 --frames_start_end 0 10 --configs arguments/tiny/basketball.py
python trainReferenceClip.py --project_total_frames 20 / # N: input video frame count
--clip_size 10 / # M: frame count of single clip
--frames_start_end 0 10 / # start frames and end frames
--iterations 5000 /
-s "/amax/long_360_tiny_dataset/" /
-m ./output/tiny_long_360 /
--configs arguments/tiny/basketball.py
- Subsequent Source Clips are trained by inheriting the static information (including anchors, static features, and the decoder) from the Reference Clip. The
-mparameter should be set the same as when training the Reference Clip, to inherit the static information. Additionally, theframes_start_endneeds to be adjusted accordingly. Note that the training of each Source Clip is independent, thus it can be parallelized to improve training speed
CUDA_VISIBLE_DEVICES=1 python trainSourceClip.py --project_total_frames 20 --clip_size 10 --iterations 5000 -s "/data8/dataset/longvideos/jpg/long_360_tiny_dataset/" -m ./output/tiny_long_360 --frames_start_end 10 20 --configs arguments/tiny/basketball.py
python trainSourceClip.py --project_total_frames 20 / # N: input video frame count
--clip_size 10 / # M: frame count of single clip
--frames_start_end 0 10 / # start frames and end frames
--iterations 5000 /
-s "/amax/long_360_tiny_dataset/" /
-m ./output/tiny_long_360 /
--configs arguments/tiny/basketball.py
CUDA_VISIBLE_DEVICES=1 python render.py --project_total_frames 20 --clip_size 10 -s $source_path --iteration $iteration -m ./output/tiny_long_360 --frames_start_end 0 20 --configs arguments/tiny/basketball.py --skip_video --skip_train
python render.py --project_total_frames 20 / # N: input video frame count
--clip_size 10 / # M: frame count of single clip
--frames_start_end 0 20 / # start frames and end frames of All clips
--iteration 5000 /
-s "/amax/long_360_tiny_dataset/" /
-m ./output/tiny_long_360 /
--configs arguments/tiny/basketball.py /
--skip_train /
--skip_video
python metrics.py -m ./output/tiny_long_360 --iteration 5000
python images2video.py -m ./output/tiny_long_360/ --iteration 5000
@InProceedings{Liang_2026_CVPR,
author = {Liang, Jie and Wu, Jiahao and Wang, Chao and Yang, Jiayu and Zheng, Xiaoyun and Xiong, Kaiqiang and Wang, Zhanke and Yan, Jinbo and Gao, Feng and Wang, Ronggang},
title = {ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {41022-41032}
}
@inproceedings{wu2025localdygs,
title={LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling},
author={Wu, Jiahao and Peng, Rui and Jiao, Jianbo and Yang, Jiayu and Tang, Luyang and Xiong, Kaiqiang and Liang, Jie and Yan, Jinbo and Liu, Runling and Wang, Ronggang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={9519--9529},
year={2025}
}

