Speech2Face

Intoduction

Note: This repo is no longer maintained. To obtain a more updated code, please visit this link.

Image synthesis has been a trending task for the AI community in recent years. Many works have shown the potential of Generative Adversarial Networks (GANs) to deal with tasks such as text or audio to image synthesis. In particular, recent advances in deep learning using audio have inspired many works involving both visual and auditory information. In this work we propose a face synthesis method which is trained end-to-end using audio and/or language representations as inputs. We used this project as baseline.

Requirements

pytorch
h5py
PIL
numpy
matplotlib

This implementation currently only support running with GPUs.

Usage

Training

`python runtime.py

Arguments:

type : GAN archiecture to use (gan | wgan | vanilla_gan | vanilla_wgan). default = gan. Vanilla mean not conditional
dataset: Dataset to use (birds | flowers). default = flowers
split : An integer indicating which split to use (0 : train | 1: valid | 2: test). default = 0
lr : The learning rate. default = 0.0002
diter : Only for WGAN, number of iteration for discriminator for each iteration of the generator. default = 5
vis_screen : The visdom env name for visualization. default = gan
save_path : Path for saving the models.
l1_coef : L1 loss coefficient in the generator loss fucntion for gan and vanilla_gan. default=50
l2_coef : Feature matching coefficient in the generator loss fucntion for gan and vanilla_gan. default=100
pre_trained_disc : Discriminator pre-tranined model path used for intializing training.
pre_trained_gen Generator pre-tranined model path used for intializing training.
batch_size: Batch size. default= 64
num_workers: Number of dataloader workers used for fetching data. default = 8
epochs : Number of training epochs. default=200
cls: Boolean flag to whether train with cls algorithms or not. default=False

References

[1] Generative Adversarial Text-to-Image Synthesis https://arxiv.org/abs/1605.05396

[2] Improved Techniques for Training GANs https://arxiv.org/abs/1606.03498

[3] Wasserstein GAN https://arxiv.org/abs/1701.07875

[4] Improved Training of Wasserstein GANs https://arxiv.org/pdf/1704.00028.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
checkpoints		checkpoints
images		images
models		models
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
convert_cub_to_hd5_script.py		convert_cub_to_hd5_script.py
convert_flowers_to_hd5_script.py		convert_flowers_to_hd5_script.py
loss_estimator.py		loss_estimator.py
onehot2image_dataset.py		onehot2image_dataset.py
plot_logfile.py		plot_logfile.py
runtime.py		runtime.py
trainer.py		trainer.py
txt2image_dataset.py		txt2image_dataset.py
utils.py		utils.py
visualize.py		visualize.py

License

franroldans/tfm-franroldan-wav2pix

Folders and files

Latest commit

History

Repository files navigation

Speech2Face

Intoduction

Requirements

Usage

Training

References

About

Resources

License

Stars

Watchers

Forks

Languages