Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Fangfu Liu¹, Diankun Wu¹, Yi Wei¹, Yongming Rao², Yueqi Duan¹

¹Tsinghua University, ²BAAI

CVPR (2024)

Paper | ArXiv | Project Page | Code

Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.

Installation

Clone Sherpa3D.

git clone https://github.com/liuff19/Sherpa3D.git
cd Sherpa3D

Create the environment, here we show an example using conda.

conda create -n Sherpa3D python=3.9
conda activate Sherpa3D

Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113.

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Install dependencies:

pip install -r requirements.txt

Download finetuned shap-E model.

cd load/shapE-finetuned/
bash download.sh

(Optional) If you are experiencing unstable connections with Hugging Face, we suggest you downloading the guidance model you used to a local folder following here and here, and set model_key of the guidance and the prompt processor to the local path.

Training

Quick Start

We provide some scripts and pre-defined configs to automatically run the geometry modeling stage and appearance modeling stage. For a quick start, you can run:

# Prompt: A head of Terracotta Army
cd bash
bash terracotta_army_1gpu.sh

# Prompt: A detailed and realistic 3D model of a vintage camera
cd bash
bash camera_4gpu.sh

We also provide scripts for some other prompts, which you can find in the "./bash" directory. The outputs will be saved in "./outputs/geometry" and "./outputs/appearance" respectively.

Custom training

To perform custom training, you need to create a config file first. We have provided some examples in the "./configs" directory.
After completing the configuration, you can start custom training. We provide the following two methods:

If you want to complete the training of both geometry and appearance in one go, please make sure your config files are named following the formats "xxx_geo.json" and "xxx_app.json", respectively. Then use the following commands to train:

cd bash
# using 1 GPU
general_1gpu xxx 
# using 4 GPU
general_4gpu xxx 
# using 8 GPU
general_8gpu xxx

If you want to train geometry and appearance respectively. You can use the following commands:

# Multi-GPU training (recommanded)
# Geometry modeling using 8 GPU 
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --config path/to/your/config_geo/file
# Geometry modeling using 4 GPU
python3 -m torch.distributed.launch --nproc_per_node=4 train.py --config path/to/your/config_geo/file
# Appearance modeling using 8 GPU
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --config path/to/your/config_app/file
# Appearance modeling using 4 GPU
python3 -m torch.distributed.launch --nproc_per_node=4 train.py --config path/to/your/config_app/file

# Single-GPU training
# Geometry modeling. 
python3 train.py --config path/to/your/config_geo/file
# Appearance modeling.
python3  train.py --config path/to/your/config_geo/file

Acknowledgement

This repo benefits from Fantasia3D, Threestudio, Shap-E and Cap3d. Thanks for their wonderful work.

BibTex

@misc{liu2023sherpa3d,
      title={Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior}, 
      author={Fangfu Liu and Diankun Wu and Yi Wei and Yongming Rao and Yueqi Duan},
      year={2023},
      eprint={2312.06655},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
bash		bash
configs		configs
dataset		dataset
geometry		geometry
load		load
render		render
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sd.py		sd.py
train.py		train.py
utils.py		utils.py

License

liuff19/Sherpa3D

Folders and files

Latest commit

History

Repository files navigation

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Fangfu Liu1, Diankun Wu1, Yi Wei1, Yongming Rao2, Yueqi Duan1

1Tsinghua University, 2BAAI

CVPR (2024)

Paper | ArXiv | Project Page | Code

Table of Contents

Installation

Training

Quick Start

Custom training

Acknowledgement

BibTex

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Fangfu Liu¹, Diankun Wu¹, Yi Wei¹, Yongming Rao², Yueqi Duan¹

¹Tsinghua University, ²BAAI