V3D: Video Diffusion Models are Effective 3D Generators

Zilong Chen^1,2, Yikai Wang¹, Feng Wang¹, Zhengyi Wang^1,2, Huaping Liu¹

¹Tsinghua University, ²ShengShu

This repository contains the official implementation of V3D: Video Diffusion Models are Effective 3D Generators.

What's New

[2024.3.14] Our demo is currently available at (here)[https://huggingface.co/spaces/heheyas/V3D]. I will add more checkpoints and more examples recently.

[Work in Progress]

We are currently working on making this completely publicly available (including refactoring code, uploading weights, etc.), so please be patient.

arXiv | Paper | Project Page | HF Demo

Video results

Single Image to 3D

Generated Multi-views

000413.mp4

000183.mp4

Reconstructed 3D Gaussian Splats

1a47fe68-a.mp4

2b391cd1-2.mp4

5f24e598-7.mp4

182a7b56-9.mp4

d8181e41-e.mp4

e4c98179-6.mp4

Sparse view scene generation (On CO3D hydrant category)

hydrant_1.mp4

hydrant_5.mp4

hydrant_4.mp4

hydrant_3.mp4

hydrant_2.mp4

Instructions:

Install the requirements:

pip install -r requirements.txt

Download our weights for V3D

wget https://huggingface.co/heheyas/V3D/resolve/main/V3D.ckpt -O ckpts/V3D_512.ckpt
wget https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors -O ckpts/svd_xt.safetensors

Run the V3D Video diffusion to generate dense multi-views

PYTHONPATH="." python scripts/pub/V3D_512.py --input_path <image file or dir> --save --border_ratio 0.3 --min_guidance_scale 4.5 --max_guidance_scale 4.5 --output-folder <output-dest>

Reconstruct 3D assets from generated multi-views Using 3D Gaussian Splatting

PYTHONPATH="." python recon/train_from_vid.py  -w --sh_degree 0 --iterations 4000 --lambda_dssim 1.0 --lambda_lpips 2.0 --save_iterations 4000 --num_pts 100_000 --video <your generated video>

Or using (NeuS) instant-nsr-pl:

cd mesh_recon
PYTHONPATH="." python launch.py --config configs/videonvs.yaml --gpu <gpu> --train system.loss.lambda_normal=0.1 dataset.scene=<scene_name> dataset.root_dir=<output_dir> dataset.img_wh='[512, 512]'

Refine texture

python refine.py --mesh <your obj mesh file> --scene <your video> --num-opt 16 --lpips 1.0 --iters 500

Acknowledgement

This code base is built upon the following awesome open-source projects:

Thank the authors for their remarkable job !

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
configs		configs
mesh_recon		mesh_recon
recon		recon
scripts		scripts
sgm		sgm
.gitignore		.gitignore
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V3D: Video Diffusion Models are Effective 3D Generators

What's New

[Work in Progress]

arXiv | Paper | Project Page | HF Demo

Video results

Instructions:

Acknowledgement

About

Languages

heheyas/V3D

Folders and files

Latest commit

History

Repository files navigation

V3D: Video Diffusion Models are Effective 3D Generators

What's New

[Work in Progress]

arXiv | Paper | Project Page | HF Demo

Video results

Instructions:

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages