SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction
Zhizhuo Zhou,
Shubham Tulsiani
CVPR '23 | GitHub | arXiv | Project page
SparseFusion reconstructs a consistent and realistic 3D neural scene representation from as few as 2 input images with known relative pose. SparseFusion is able to generate detailed and plausible structures in uncertain or unobserved regions (such as front of the hydrant, teddybear's face, back of the laptop, or left side of the toybus).
This project is built on top of open-source code. We thank the open-source research community and credit our use of parts of Stable Diffusion, Imagen Pytorch, and torch-ngp below.
Our code release contains:
- Code for inference
- Code for training
- Pretrained weights for 10 categories
For bugs and issues, please open an issue on GitHub and I will try to address it promptly.
Please follow the environment setup guide in ENVIRONMENT.md.
We provide two options for datasets, the original CO3Dv2 dataset and also a heavily cutdown toy dataset for demonstration purposes only. Please download at least one dataset.
- (optional) Download CO3Dv2 dataset (5.5TB) here and follow instructions to extract them to a folder. We assume the default location to be
data/co3d/{category_name}
. - Download the toy evaluation only CO3Dv2 dataset (6.7GB) here and place them in a folder. We assume the default location to be
data/co3d_toy/{category_name}
.
SparseFusion requires both SparseFusion weights and Stable Diffusion VAE weights.
- Find SparseFusion weights here. Please download and put in
checkpoints/sf/{category_name}
. - Download Stable Diffusion v-1-3 weights here and rename
sd-v1-3.ckpt
tosd-v1-3-vae.ckpt
. While our code is compatible with the default downloaded weight, we only use the VAE weights from Stable Diffusion. We assume the default location and filename of the vae checkpoint to becheckpoints/sd/sd-v1-3-vae.ckpt
.
To run evaluation, assuming the CO3D toy dataset and model weights are in the default paths specified above, simply pass in -d, --dataset_name
and -c, --category
:
$ python demo.py -d co3d_toy -c hydrant
To specify specific scenes on evaluate on, pass the desired index 0,5,7
to -i, --idx
.
$ python demo.py -d co3d_toy -c hydrant -i 0,5,7
To specify the number of input views to use, specify -v, --input_views
.
$ python demo.py -d co3d_toy -c hydrant -i 0,5,7 -v 3
To specify a custom dataset root location, specify -r, --root
.
$ python demo.py -d co3d_toy -r data/co3d_toy -c hydrant -i 0,5,7 -v 3
To specify custom model checkpoints, specify --eft
, --vldm
, and --vae
.
$ python demo.py -d co3d_toy -r data/co3d_toy -c hydrant -i 0,5,7 -v 3 \
--eft checkpoints/sf/hydrant/ckpt_latest_eft.pt \
--vldm checkpoints/sf/hydrant/ckpt_latest.pt \
--vae checkpoints/sd/sd-v1-3-vae.pt
To use the original CO3Dv2 dataset, pass co3d
for dataset_name -d
and also the dataset root location -r
.
$ python demo.py -d co3d -r data/co3d/ -c hydrant -i 0
-g, --gpus number of gpus to use (default: 1)
-p, --port last digit of DDP port (default: 1)
-d, --dataset_name name of dataset (default: co3d_toy)
-r, --root root directory of the dataset
-c, --category CO3D category
-v, --input_views number of random input views (default: 2)
-i, --idx scene indices to evaluate (default: 0)
-e, --eft location to EFT checkpoint
-l, --vldm location to VLDM checkpoint
-a, --vae location to Stable Diffusion VAE checkpoint
Output artifacts—images, gifs, torch-ngp checkpoints—will be saved to output/demo/
by default.
Early access training code is provided in train.py
. Please follow the evaluation tutorial above to setup the environment and pretrained VAE weights. It is recommended to directly modify train.py
to specify the experiment directory and set the training hyperparameters. We show training flags below.
-g, --gpus number of gpus to use (default: 1)
-p, --port last digit of DDP port (default: 1)
-d, --dataset_name name of dataset (default: co3d_toy)
-r, --root root directory of the dataset
-c, --category CO3D category
-a, --vae location to Stable Diffusion VAE checkpoint
-b, --backend distributed data parallel backend (default: nccl)
To train on a custom dataset, one needs to write a custom dataloader. We describe the required outputs for the __getitem__
function, which should be a dictionary containing:
{
'images': (B, 3, H, W) image tensor,
'R': (B, 3, 3) PyTorch3D rotation,
'T': (B, 3) PyTorch3D translation,
'f': (B, 2) PyTorch3D focal_length in NDC space,
'c': (B, 2) PyTorch3D principal_point in NDC space,
'valid_region': (B, 1, H, W) binary tensor where 1 denotes valid image region,
'image_size': (B, 2) image size
}
If you find this work useful, a citation will be appreciated via:
@inproceedings{zhou2023sparsefusion,
title={SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction},
author={Zhizhuo Zhou and Shubham Tulsiani},
booktitle={CVPR},
year={2023}
}
We thank Naveen Venkat, Mayank Agarwal, Jeff Tan, Paritosh Mittal, Yen-Chi Cheng, and Nikolaos Gkanatsios for helpful discussions and feedback. We also thank David Novotny and Jonáš Kulhánek for sharing outputs of their work and helpful correspondence. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. (DGE1745016, DGE2140739).
We also use parts of existing projects:
VAE from Stable Diffusion.
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Diffusion model from Imagen Pytorch.
@misc{imagen-pytorch,
Author = {Phil Wang},
Year = {2022},
Note = {https://github.com/lucidrains/imagen-pytorch},
Title = {Imagen - Pytorch}
}
Instant NGP implementation from torch-ngp.
@misc{torch-ngp,
Author = {Jiaxiang Tang},
Year = {2022},
Note = {https://github.com/ashawkey/torch-ngp},
Title = {Torch-ngp: a PyTorch implementation of instant-ngp}
}