Panwang Pan†, Chenguo Lin†, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu
This repository contains the official implementation of the paper: Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models. Diff4Splat is a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image. Our approach unifies the generative priors of video diffusion models with geometry and motion constraints learned from large-scale 4D datasets.
Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion, all in a single forward pass, without test-time optimization or post-hoc refinement.
Here is our Project Page.
Feel free to contact us or open an issue if you have any questions or suggestions.
- 2025-11-01: Diff4Splat is released on arXiv.
- 2025-10-15: Initial codebase structure established.
- 2025-10-01: Project development started.
- Release inference scripts.
- Release training code and data preprocessing scripts.
- Release pretrained checkpoints.
- Provide a HuggingFace🤗 demo.
- Release preprocessed dataset.
Coming soon - Installation instructions will be provided once the codebase is ready for release.
Coming soon - Quick start examples will be provided once the inference scripts are released.
A CUDA-enabled GPU with at least 12GB VRAM is recommended for optimal performance. The model can be adapted to run on GPUs with less memory by adjusting the resolution and batch size parameters.
Dataset information and preprocessing scripts will be released along with the training code.
Training scripts and configurations will be released in the future.
Diff4Splat introduces a novel framework for controllable 4D scene generation:
- Video Latent Transformer: Augments video diffusion models to jointly capture spatio-temporal dependencies
- Deformable 3D Gaussian Field: Encodes appearance, geometry, and motion in a unified representation
- Single Forward Pass: Generates high-quality 4D scenes in approximately 30 seconds
- Controllable Generation: Supports camera trajectory and optional text prompts
- Explicit Representation: Produces deformable 3D Gaussian primitives
- Efficient Inference: No test-time optimization or post-hoc refinement required
- Multi-task Capability: Supports video generation, novel view synthesis, and geometry extraction
Diff4Splat demonstrates state-of-the-art performance across multiple tasks:
- Generates temporally consistent video sequences from single images
- Supports controllable camera trajectories
- Produces high-quality novel views from arbitrary camera positions
- Maintains geometric consistency across viewpoints
- Extracts accurate 3D geometry from generated scenes
- Enables downstream applications like mesh reconstruction
- Repository setup and documentation
- Inference code release
- Pretrained model weights
- Training code release
- Dataset preprocessing scripts
- Comprehensive evaluation benchmarks
- Real-time inference optimization
- Multi-modal conditioning support
- Interactive demo applications
If you find our work helpful, please consider citing:
@article{pan2025diff4splat,
title={Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models},
author={Pan, Panwang and Lin, Chenguo and Zhao, Jingjing and Li, Chenxin and Lin, Yuchen and Li, Haopeng and Yan, Honglei and Wen, Kairun and Lin, Yunlong and Yuan, Yixuan and others},
journal={arXiv preprint arXiv:2511.00503},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the authors of PartCrafter, DiffSplat, and other related works for their inspiring research and open-source contributions that helped shape this project.
- MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
- DiffSplat: Single-View 3D Reconstruction via Differentiable Splatting
Note: This README is currently a placeholder. The actual implementation, inference scripts, and training code will be released according to the roadmap above.
