Some details about the training parameters. #2

lionel3 · 2018-08-07T05:20:52Z

I trained the network from scratch but got poor results.
Below are my training parameters(same as the training guide you provide).

--program_name=twingan
--dataset_name="image_only"
--dataset_dir="dir to celeba tfrecord files"
--unpaired_target_dataset_name="anime_faces"
--unpaired_target_dataset_dir="dir to the anime tfrecord you provided"
--train_dir="dir to save results"
--dataset_split_name=train
--preprocessing_name="danbooru"
--resize_mode=RANDOM_CROP
--do_random_cropping=True
--learning_rate=0.0001
--learning_rate_decay_type=fixed
--is_training=True
--generator_network="pggan"
--loss_architecture=dragan
--pggan_max_num_channels=256
--generator_norm_type=batch_renorm
--use_ttur=True
--num_images_per_resolution=50000

Compared with official PGGAN repo, I found some differences.

num_image_per_resolution of TwinGAN is 50000 while PGGAN is 600000.
TwinGAN uses RANDOM_CROP while PGGAN uses RESIZE directly.

Could you please help me out here?

The text was updated successfully, but these errors were encountered:

jerryli27 · 2018-08-07T15:53:55Z

Yes you're right. Sorry for the wrong documentation. I'll push a newer version shortly.

The num_image_per_resolution I used was '300000'. Of course 600000 should also work, but it takes longer to train.
Please change to --resize_mode=RESHAPE.

FYI. The --do_random_cropping=True is in case ~~You can try RANDOM_CROP as well if~~ at inference time the quality is too bad because the face is not at the center of the image.

I am rerunning the exact code that I provided in the training example code. It will take a day or two for me to verify that it works.

lionel3 · 2018-08-08T03:17:41Z

Thanks for your answer.

Besides, when training with
'hw_to_batch_size', '{4: 16, 8: 16, 16: 16, 32: 16, 64: 12, 128: 12, 256: 12, 512: 6}.
I got ResourceExhaustedError: OOM when allocating tensor with ... during fade-in phase from resolution 128 to 256. Same error when trying 2 GPUs.
I am not familiar with Tensorflow. I guess there may be some bug with Multi-GPU training.

I will try to reproduce the error and show more training details once I have idle GPU.

jerryli27 · 2018-08-08T15:08:24Z

I added two lines to the training script. It should work now.
--gradient_penalty_lambda=0.25 --use_unet=True

The whole script now looks like:

python pggan_runner.py
--program_name=twingan
--dataset_name="image_only"
# Assume you have data like 
# ./data/celeba/train-00000-of-00100.tfrecord,  
# ./data/celeba/train-00001-of-00100.tfrecord ...
--dataset_dir="./data/celeba/"
--unpaired_target_dataset_name="anime_faces"
--unpaired_target_dataset_dir="./data/anime_faces/"
--train_dir="./checkpoints/twingan_faces/"
--dataset_split_name=train
--preprocessing_name="danbooru"
--resize_mode=RESHAPE
--do_random_cropping=True
--learning_rate=0.0001
--learning_rate_decay_type=fixed
--is_training=True
--generator_network="pggan"
--use_unet=True
--num_images_per_resolution=300000
--loss_architecture=dragan
--gradient_penalty_lambda=0.25
--pggan_max_num_channels=256
--generator_norm_type=batch_renorm
--hw_to_batch_size="{4: 8, 8: 8, 16: 8, 32: 8, 64: 8, 128: 4, 256: 3, 512: 2}"

I haven't tested with the multi-gpu setting thoroughly yet due to limits in hardware, so yes there may be some bug, but you can try to add the following flags.

--sync_replicas=False
--replicas_to_aggregate=1
--num_clones=2
--worker_replicas=1

I updated the training readme with the comments above.

lionel3 · 2018-08-09T02:50:29Z

Thanks, I will try it out asap.

jerryli27 · 2018-08-17T16:38:27Z

Hi @lionel3 I updated the training documentation. There was indeed a bug in my default parameters. After fixing that I am able to reproduce my previous results.

Please sync to the latest version and see https://github.com/jerryli27/TwinGAN/blob/master/docs/training.md .

The parameters I added are:

--do_pixel_norm=True
--l_content_weight=0.1
--l_cycle_weight=1.0

Please reopen this issue if you cannot reproduce. Thanks!

lionel3 closed this as completed Aug 9, 2018

lionel3 reopened this Aug 9, 2018

jerryli27 self-assigned this Aug 10, 2018

jerryli27 closed this as completed Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some details about the training parameters. #2

Some details about the training parameters. #2

lionel3 commented Aug 7, 2018 •

edited

Loading

jerryli27 commented Aug 7, 2018 •

edited

Loading

lionel3 commented Aug 8, 2018

jerryli27 commented Aug 8, 2018

lionel3 commented Aug 9, 2018

jerryli27 commented Aug 17, 2018

Some details about the training parameters. #2

Some details about the training parameters. #2

Comments

lionel3 commented Aug 7, 2018 • edited Loading

jerryli27 commented Aug 7, 2018 • edited Loading

lionel3 commented Aug 8, 2018

jerryli27 commented Aug 8, 2018

lionel3 commented Aug 9, 2018

jerryli27 commented Aug 17, 2018

lionel3 commented Aug 7, 2018 •

edited

Loading

jerryli27 commented Aug 7, 2018 •

edited

Loading