PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

By Xiaoxiong Du, Jun Peng, Yiyi Zhou, Jinlu Zhang, Siting Chen, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji.

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

DEMO VIDEO

Introduction

This repository is pytorch implementation of PixelFace+. PixelFace+ utilizes both mask and text features for highly controllable face generation and manipulation. We propose the GCMF module to achieve better decoupling. Additionally, to enhance the alignment between generated images and text, we introduce a regularization loss function based on CLIP. The framework diagram of PixelFace+ is shown below:

Citation

@inproceedings{10.1145/3581783.3612067,
author = {Du, Xiaoxiong and Peng, Jun and Zhou, Yiyi and Zhang, Jinlu and Chen, Siting and Jiang, Guannan and Sun, Xiaoshuai and Ji, Rongrong},
title = {PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks},
year = {2023},
isbn = {9798400701085},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3581783.3612067},
doi = {10.1145/3581783.3612067},
pages = {4666–4677},
numpages = {12},
keywords = {controllable face generation, face editing},
series = {MM '23}
}

Prerequisites

python 3.6
pytorch 1.10.0
pytorch-fid 0.2.1
torchvision 0.11.1

Data preparation

Multi-Modal-CelebA-HQ Dataset [Link]

Before training, please dowload the dataset2.json (which has been compressed as a zip file), and place the file in the MMceleba dataset directory.

Training

Preparing your settings. To train a model, you should modify code/cfg/mmceleba.yml to adjust the settings you want. The default configuration is to train on MMceleba with input and output image resolution set to 256*256, and BatchSize set to 4. Increasing the BatchSize may result in a decrease in semantic alignment after training, as a larger BatchSize reduces the constraint of the CLIP regularization loss.
Training the model. run train.py under the main folder to start training:

cd /PixelFace+/code
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node  2 --master_port 10011  main.py --cfg cfg/mmceleba.yml

Testing the model. After training for more than 70 epochs, the model automatically evaluates its performance every ten epochs. If you need to modify the evaluation frequency, you can do so at line 675 in \code\trainer.py.

Testing

You can use the eval1 method(which at line 732 of \code\trainer.py) to generate iamges.

If you want to generate an image from your own description, you may can try to put the code of sample.py to \code\trainer.py.

Pretrain Model

Dowload the pretrain model. The Model link: https://pan.baidu.com/s/1ARSjz6IXCO2-8qf1Tf9p-A?pwd=qwer, the file extraction code:qwer.
Modify the cfg file\code\cfg\mmceleba.yml to use the pretrain model:

TRAIN:
  FLAG: True

  ##### Modify This Line #####
  NET_G: '/PATH/TO/PRETRAIN/MODEL'

  B_NET_D: True
  BATCH_SIZE: 4  
  MAX_EPOCH: 100
  SNAPSHOT_INTERVAL: 1  
  DISCRIMINATOR_LR: 0.004
  GENERATOR_LR: 0.002

Acknowledgement

Thanks for a lot of codes from PixelFolder and PixelFace.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
PixelFace+/code		PixelFace+/code
README.md		README.md
dataset2.zip		dataset2.zip
framwork.png		framwork.png
sample.py		sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixelFace+/code

PixelFace+/code

README.md

README.md

dataset2.zip

dataset2.zip

framwork.png

framwork.png

sample.py

sample.py

Repository files navigation

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

DEMO VIDEO

Introduction

Citation

Prerequisites

Data preparation

Training

Testing

Pretrain Model

Acknowledgement

About

Releases

Packages

Languages

qazwsx671713/PixelFace-Plus

Folders and files

Latest commit

History

Repository files navigation

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

DEMO VIDEO

Introduction

Citation

Prerequisites

Data preparation

Training

Testing

Pretrain Model

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages