Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: monet2photo training loss #30

Closed
filmo opened this issue May 17, 2017 · 5 comments
Closed

Question: monet2photo training loss #30

filmo opened this issue May 17, 2017 · 5 comments

Comments

@filmo
Copy link

filmo commented May 17, 2017

I'm trying to train the monet2photo. My command line was:

python train.py --dataroot ./datasets/monet2photo --name monet2photo --model cycle_gan --gpu_ids 0,1 --batchSize 8 --identity 0.5

The paper discussed using a batch size of 1, but I increased it to 8 to more fully occupy the GPUs. I think this is the only difference between what was described in the paper and my settings, but I may be wrong.

------------ Options -------------
align_data: False
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: ./datasets/monet2photo
display_freq: 100
display_id: 1
display_winsize: 256
fineSize: 256
gpu_ids: [0, 1]
identity: 0.5
input_nc: 3
isTrain: True
lambda_A: 10.0
lambda_B: 10.0
loadSize: 286
lr: 0.0002
max_dataset_size: inf
model: cycle_gan
nThreads: 2
n_layers_D: 3
name: monet2photo
ndf: 64
ngf: 64
niter: 100
niter_decay: 100
no_flip: False
no_html: False
no_lsgan: False
norm: instance
output_nc: 3
phase: train
pool_size: 50
print_freq: 100
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
use_dropout: False
which_direction: AtoB
which_epoch: latest
which_model_netD: basic
which_model_netG: resnet_9blocks
-------------- End ----------------
UnalignedDataLoader
#training images = 6287
cycle_gan

I'm training on two GTX-1070s

I'm about 80 epochs in (~40 hours on my set up) and it seems like I'm oscillating between generated 'photos' that look okay-ish and 'photos' that look pretty 'meh', more like the original painting.

My loss declined pretty rapidly for the first 20 or so epochs, but now seems to be relatively stable with occasional crazy spikes:

newplot

I think it's improving slightly with each epoch based on the images and there seems to be a slight downward trend on the loss, but I also might just be kidding myself because I've been staring at it for a while. In other words, I'm not certain that what it's generating a epoch 80 is really that much better than epoch 30. Here's the most recent detailed loss curve.

newplot 3

Question: Is this expected behavior (more or less) or should I be concerned that I've plateaued and/or used the wrong settings. At 100 epochs the learning rate is set to start decreasing based on the default settings. Given that it's taking about 30 minutes per epoch and thus about 61 more hours to complete 200 epochs, I'm wondering if I should "keep on going" or "abort" and fix some settings.

@filmo
Copy link
Author

filmo commented May 25, 2017

I"m not sure how representative this is, but here's my final loss for the discriminators and generators.
There's visible oscillation from about 20 epochs to 100 epochs for Generator B as well as D_B.

Once the learning rate started to get lowered at epoch 100, G_B loss slowly increased and G_A seemed to converge in the .35 to .40 range. Both Discriminators stopped oscillating and gradually got lower.

Perhaps this will be useful to someone doing the same. I used instance norm. Perhaps I should have used batch norm since I was running batch_size = 8

2017-05-24 monet2photo g d

@junyanz
Copy link
Owner

junyanz commented Jun 9, 2017

The losses are not so interpretable as G and D are optimizing a minimax game. The plots you posted here looks quite typical to me (except the spike). I will mainly focus on the quality of images.

@filmo filmo closed this as completed Jun 19, 2017
@John1231983
Copy link

@filmo : Have you find the reason why the spike in the loss and how to solve it?

@filmo
Copy link
Author

filmo commented Oct 20, 2019

No, I didn't end up exploring it further.

@John1231983
Copy link

Thanks. Let follows the issue how it will be solved
#807

JiahangLiGary pushed a commit to lanbas/pytorch-CycleGAN-and-pix2pix that referenced this issue Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants