Figure: Overview of our modularized pipeline for 3D-aware image synthesis, which modularizes the generation process in a universal way. Each module can be improved independently, facilitating algorithm development. Note that the discriminator is omitted for simplicity.
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase
Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Sida Peng, Yujun Shen
NeurIPS 2023 Datasets and Benchmarks Track
[Paper]
Supported Modules (8)
Our code is tested with Python 3.8, CUDA 11.3, and PyTorch 1.11.0.
-
Install package requirements via
conda
:conda create -n <ENV_NAME> python=3.8 # create virtual environment with Python 3.8 conda activate <ENV_NAME> conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch # install PyTorch 1.11.0 pip install -r requirements.txt # install dependencies
-
Our code requires nvdiffrast, so please refer to the documentation for instructions on how to install it.
-
Our code also uses the face reconstruction model to evaluate metrics. Please refer to this guide to prepare prerequisite models.
-
To use a video visualizer (optional), please also install
ffmpeg
.- Ubuntu:
sudo apt-get install ffmpeg
. - MacOS:
brew install ffmpeg
.
- Ubuntu:
-
To reduce memory footprint (optional), you can switch to either
jemalloc
(recommended) ortcmalloc
rather than your default memory allocator.- jemalloc (recommended):
- Ubuntu:
sudo apt-get install libjemalloc
- Ubuntu:
- tcmalloc:
- Ubuntu:
sudo apt-get install google-perftools
- Ubuntu:
- jemalloc (recommended):
FFHQ and ShapeNet Cars: Please refer to this guide to prepare the datasets.
Cats: Please refer to this guide to prepare the dataset.
Train EG3D on FFHQ in Resolution of 515x512
In your Terminal, run:
./scripts/training_demos/eg3d_ffhq512.sh <NUM_GPUS> <PATH_TO_DATA> [OPTIONS]
where
-
<NUM_GPUS>
refers to the number of GPUs. Setting<NUM_GPUS>
as 1 helps launch a training job on single-GPU platforms. -
<PATH_TO_DATA>
refers to the path of FFHQ dataset (in a resolution of 256x256) withzip
format. If running on local machines, a soft link of the data will be created under thedata
folder of the working directory to save disk space. -
[OPTIONS]
refers to any additional option to pass. Detailed instructions on available options can be shown via./scripts/training_demos/eg3d_ffhq512.sh <NUM_GPUS> <PATH_TO_DATA> --help
.
This demo script uses eg3d_ffhq512
as the default value of job_name
, which is particularly used to identify experiments. Concretely, a directory with the name job_name
will be created under the root working directory (which is set as work_dirs/
by default). To prevent overwriting previous experiments, an exception will be raised to interrupt the training if the job_name
directory has already existed. To change the job name, please use --job_name=<NEW_JOB_NAME>
option.
Other 3D GAN models reproduced by our codebase can be trained similarly, please refer to scripts under ./scripts/training_demos/
for more details.
To investigate the effect of various point embedders, one can utilize the following command to train the models.
./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'coordinate' --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'volume' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'triplane' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'eg3d' --mlp_depth 2 --mlp_hidden_dim 64 --mlp_output_dim 32 --r1_gamma 1.5
Besides using TensorBoard to track the training process, the raw results (e.g., training losses and running time) are saved in JSON Lines format. They can be easily inspected with the following script
import json
file_name = '<PATH_TO_WORK_DIR>/log.json'
data_entries = []
with open(file_name, 'r') as f:
for line in f:
data_entry = json.loads(line)
data_entries.append(data_entry)
# An example of data entry
# {"Loss/D Fake": 0.4833524551040682, "Loss/D Real": 0.4966000154727226, "Loss/G": 1.1439273656869773, "Learning Rate/Discriminator": 0.002352941082790494, "Learning Rate/Generator": 0.0020000000949949026, "data time": 0.0036810599267482758, "iter time": 0.24490128830075264, "run time": 66108.140625}
After training a model, one can employ the following scripts to run inference and visualize the results, including images, videos, and geometries.
CUDA_VISIBLE_DEVICES=0 python test_3d_inference.py --model <PATH_TO_MODEL> --work_dir <PATH_TO_WORK_DIR> --save_image true --save_video false --save_shape true --shape_res 512 --num 10 --truncation_psi 0.7
After training a model, one can use the following scripts to evaluate various metrics, including FID, face identity consistency (ID), depth error (DE), pose error (PE) and reprojection error (RE).
python -m torch.distributed.launch --nproc_per_node=1 test_3d_metrics.py --dataset <PATH_TO_DATA> --model <PATH_TO_MODEL> --test_fid true --align_face true --test_identity true --test_reprojection_error true --test_pose true --test_depth true --fake_num 1000
- Upload pretrained checkpoints
- User Guide
This repository is built upon Hammer, on top of which we reimplement GRAF, GIRAFFE, π-GAN, StyleSDF, StyleNeRF, VolumeGAN, GRAM, EpiGRAF and EG3D.
@article{wang2023benchmarking,
title = {Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase},
author = {Wang, Qiuyu and Shi, Zifan and Zheng, Kecheng and Xu, Yinghao and Peng, Sida and Shen, Yujun},
journal = {arXiv preprint arXiv:2306.12423},
year = {2023}
}