We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code optimization and then edited images in the latent space. Despite their promising results near the input view, they still suffer from the 3D inconsistency of produced images at large camera poses and imprecise image attribute editing, like affecting unspecified attributes during editing. For more efficient image inversion, we train a shared encoder for all images. To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images. We compare the latent space and inversion manifold of GAN models and demonstrate that editing in the inversion manifold can achieve better results in both quantitative and qualitative evaluations. Extensive experiments show that our method produces more 3D consistent images and achieves more precise image editing than previous work.
Official Implementation of "PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image" paper. The PREIM3D enables reconstructing texture and geometry from a single real image within one second and allows one to perform a list of attributes editing sequentially.
we propose a pipeline that enables PRecise Editing in the Inversion Manifold with 3D consistency efficiently, termed PREIM3D.
The environment can be set up from the provided environment.yml
:
conda env create -f environment.yml
Please download our pre-trained models from the following links and put it in ./pretrained
.
Path | Description |
---|---|
PREIM3D FFHQ | FFHQ PREIM3D inversion encoder. |
We also provide other models needed for inference and training and put it in ./pretrained.
Path | Description |
---|---|
EG3D Generator | EG3D generator model pretrained on FFHQ taken from EG3D with 512x512 output resolution. |
IR-SE50 Model | Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during training. |
MOCOv2 Model | Pretrained ResNet-50 model trained using MOCOv2 for use in our simmilarity loss for domains other then human faces during training. |
Editing Directions | Portrait attribute editing directions computed in the inversion manifold using PREIM3D. |
FFHQ: Download and preprocess the Flickr-Faces-HQ dataset following EG3D. Note, the cropping is different from the original FFHQ.
Custom dataset: You can process your own dataset following EG3D. See EG3D for alignment details.
We put some test images in ./data/test
, and you can quickly try them.
You can use inference.sh
to apply the model on a set of images.
./inference.sh
We provide a web demo for interactive editing tool. To start it, run:
python web_demo.py
We train PREIM3D on FFHQ cropped by to EG3D. Please configure the dataset in configs/paths_config.py
and configs/data_configs.py
You can use train.sh
to train the model.
./train.sh
If you want to finetune the model, you can add --checkpoint_path
.
Thanks to omertov and nvlabsfor sharing their code.
If you use this code for your research, please cite:
@InProceedings{Li_2023_CVPR,
author = {Li, Jianhui and Li, Jianmin and Zhang, Haoji and Liu, Shilong and Wang, Zhengyi and Xiao, Zihao and Zheng, Kaiwen and Zhu, Jun},
title = {PREIM3D: 3D Consistent Precise Image Attribute Editing From a Single Image},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {8549-8558}
}