100 epochs with 10,000 images from celebA... still noise? #4

RtFishers · 2017-10-24T08:36:15Z

Hi, thanks very much for adding a more layers so that the networks would be able to generate higher res images...

I'm a bit confused about how to go about training properly. I put 10,000 images from "img_align_celebA" into the landscape/images folder and ran "DATA_ROOT=landscape dataset=folder ndf=30 ngf=90 th main.lua", but I'm still getting almost pure noise in the localhost:8000 display... is this normal?

robbiebarrat · 2017-10-24T17:14:49Z

Obviously - something is wrong. I trained on celebA (didn't put up the weights, as it isn't really 'art') and got pretty good results (recognizable as a face) pretty early on...

Make sure that your display hasn't crashed and isn't updating - run th -ldisplay.start in a new terminal. Also; can you please paste in a selection of the output of training (just the logs of one of the later epochs)? Has one loss dipped down to zero (is the generator / discriminator winning out over the other one?)

EDIT: sorry - didn't mean to close this issue :P

RtFishers · 2017-10-25T06:51:02Z

Okay nvm... I removed two folders in my "art-DCGAN-master" directory that I believe may have been screwing with the process... one named "images" and another named "folder"... both empty.
I'm getting blocky noise that resembles the images now - using the landscape images scraped from the wiki (around ~1250 images).
How many epochs does it usually take till you get images that resemble the dataset?

RtFishers · 2017-10-25T07:43:55Z

Hmmm... I wonder if it's still supposed to look like this after 100 epochs: https://imgur.com/a/VOG63

Here is a portion from the logs:
Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ Epoch: [95][ 10 / Epoch: [95][ 11 / Epoch: [95][ 12 / Epoch: [95][ 13 / End of epoch Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ Epoch: [96][ 10 / Epoch: [96][ 11 / Epoch: [96][ 12 / Epoch: [96][ 13 / End of epoch Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ Epoch: [97][ 10 / Epoch: [97][ 11 / Epoch: [97][ 12 / Epoch: [97][ 13 / End of epoch Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ Epoch: [98][ 10 / Epoch: [98][ 11 / Epoch: [98][ 12 / Epoch: [98][ 13 / End of epoch Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ Epoch: [99][ 10 / Epoch: [99][ 11 / Epoch: [99][ 12 / Epoch: [99][ 13 / End of epoch Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ Epoch: [100][ 10 / Epoch: [100][ 11 / Epoch: [100][ 12 / Epoch: [100][ 13 / End of epoch 0 / 13] Time: 0.266 DataTime: 0.000 Err_G: 0.6444 Err_D: 1.4740
1 / 13] Time: 0.177 DataTime: 0.000 Err_G: 1.3765 Err_D: 1.6764
2 / 13] Time: 0.282 DataTime: 0.000 Err_G: 0.7322 Err_D: 1.3951
3 / 13] Time: 0.286 DataTime: 0.000 Err_G: 1.5892 Err_D: 0.8094
4 / 13] Time: 1.161 DataTime: 1.004 Err_G: 1.4294 Err_D: 0.6988
5 / 13] Time: 0.284 DataTime: 0.000 Err_G: 1.3500 Err_D: 0.7171
6 / 13] Time: 0.283 DataTime: 0.000 Err_G: 1.1339 Err_D: 0.8040
7 / 13] Time: 0.293 DataTime: 0.134 Err_G: 1.7658 Err_D: 0.6731
8 / 13] Time: 1.090 DataTime: 0.932 Err_G: 0.9855 Err_D: 0.9451
9 / 13] Time: 0.748 DataTime: 0.252 Err_G: 1.2098 Err_D: 1.4906
13] Time: 0.186 DataTime: 0.007 Err_G: 0.6246 Err_D: 1.4969
13] Time: 0.967 DataTime: 0.810 Err_G: 2.3561 Err_D: 1.0235
13] Time: 1.040 DataTime: 0.882 Err_G: 0.5786 Err_D: 1.5334
13] Time: 0.287 DataTime: 0.000 Err_G: 2.3278 Err_D: 1.0305
95 / 100 Time Taken: 8.909
0 / 13] Time: 0.271 DataTime: 0.000 Err_G: 0.7569 Err_D: 1.3361
1 / 13] Time: 0.174 DataTime: 0.000 Err_G: 1.4464 Err_D: 1.1587
2 / 13] Time: 0.288 DataTime: 0.000 Err_G: 1.1522 Err_D: 1.6824
3 / 13] Time: 0.285 DataTime: 0.000 Err_G: 0.6205 Err_D: 1.5008
4 / 13] Time: 1.099 DataTime: 0.937 Err_G: 2.8557 Err_D: 2.0824
5 / 13] Time: 0.462 DataTime: 0.301 Err_G: 0.2834 Err_D: 1.9902
6 / 13] Time: 0.702 DataTime: 0.545 Err_G: 1.3848 Err_D: 1.0088
7 / 13] Time: 0.284 DataTime: 0.000 Err_G: 0.8531 Err_D: 1.1756
8 / 13] Time: 0.618 DataTime: 0.462 Err_G: 1.3284 Err_D: 1.3724
9 / 13] Time: 1.511 DataTime: 0.335 Err_G: 0.7013 Err_D: 1.2385
13] Time: 0.218 DataTime: 0.036 Err_G: 2.0415 Err_D: 1.0599
13] Time: 0.739 DataTime: 0.584 Err_G: 0.7567 Err_D: 1.1829
13] Time: 0.338 DataTime: 0.180 Err_G: 1.2995 Err_D: 1.1113
13] Time: 1.486 DataTime: 1.330 Err_G: 1.1576 Err_D: 1.0317
96 / 100 Time Taken: 9.749
0 / 13] Time: 0.269 DataTime: 0.000 Err_G: 1.3777 Err_D: 1.2087
1 / 13] Time: 0.174 DataTime: 0.000 Err_G: 0.5777 Err_D: 1.4291
2 / 13] Time: 0.287 DataTime: 0.000 Err_G: 1.2322 Err_D: 1.4589
3 / 13] Time: 1.049 DataTime: 0.893 Err_G: 0.3955 Err_D: 1.7495
4 / 13] Time: 0.288 DataTime: 0.000 Err_G: 0.9927 Err_D: 1.9682
5 / 13] Time: 0.281 DataTime: 0.000 Err_G: 0.6527 Err_D: 1.6678
6 / 13] Time: 0.285 DataTime: 0.000 Err_G: 1.0144 Err_D: 1.0721
7 / 13] Time: 0.900 DataTime: 0.743 Err_G: 1.3015 Err_D: 0.8945
8 / 13] Time: 0.841 DataTime: 0.684 Err_G: 1.3483 Err_D: 0.6924
9 / 13] Time: 0.662 DataTime: 0.155 Err_G: 1.4783 Err_D: 1.0527
13] Time: 1.082 DataTime: 0.914 Err_G: 0.4055 Err_D: 1.6375
13] Time: 0.574 DataTime: 0.416 Err_G: 2.6284 Err_D: 1.5000
13] Time: 0.364 DataTime: 0.206 Err_G: 0.4391 Err_D: 1.5753
13] Time: 0.285 DataTime: 0.091 Err_G: 1.7904 Err_D: 0.8829
97 / 100 Time Taken: 9.038
0 / 13] Time: 0.270 DataTime: 0.000 Err_G: 1.3242 Err_D: 0.7591
1 / 13] Time: 0.174 DataTime: 0.000 Err_G: 0.5229 Err_D: 1.5423
2 / 13] Time: 0.557 DataTime: 0.400 Err_G: 1.8146 Err_D: 1.1900
3 / 13] Time: 0.447 DataTime: 0.288 Err_G: 0.9109 Err_D: 1.1277
4 / 13] Time: 0.570 DataTime: 0.414 Err_G: 1.2145 Err_D: 0.9000
5 / 13] Time: 0.286 DataTime: 0.000 Err_G: 1.3356 Err_D: 0.9014
6 / 13] Time: 0.287 DataTime: 0.000 Err_G: 1.5560 Err_D: 0.8859
7 / 13] Time: 0.746 DataTime: 0.588 Err_G: 1.1245 Err_D: 0.7293
8 / 13] Time: 1.034 DataTime: 0.875 Err_G: 1.1159 Err_D: 0.9739
9 / 13] Time: 0.741 DataTime: 0.165 Err_G: 1.3555 Err_D: 1.0987
13] Time: 0.724 DataTime: 0.550 Err_G: 0.5194 Err_D: 1.6065
13] Time: 0.323 DataTime: 0.166 Err_G: 0.9838 Err_D: 1.5638
13] Time: 0.515 DataTime: 0.359 Err_G: 2.6404 Err_D: 1.1168
13] Time: 0.295 DataTime: 0.000 Err_G: 0.2485 Err_D: 2.4634
98 / 100 Time Taken: 8.713
0 / 13] Time: 0.269 DataTime: 0.000 Err_G: 2.0790 Err_D: 1.5072
1 / 13] Time: 0.173 DataTime: 0.000 Err_G: 1.6221 Err_D: 0.9545
2 / 13] Time: 0.286 DataTime: 0.000 Err_G: 0.6393 Err_D: 1.4517
3 / 13] Time: 0.282 DataTime: 0.000 Err_G: 0.8820 Err_D: 1.2528
4 / 13] Time: 0.349 DataTime: 0.191 Err_G: 1.6816 Err_D: 1.0430
5 / 13] Time: 0.440 DataTime: 0.283 Err_G: 0.7517 Err_D: 1.1776
6 / 13] Time: 0.486 DataTime: 0.327 Err_G: 1.1272 Err_D: 0.9839
7 / 13] Time: 1.124 DataTime: 0.968 Err_G: 1.4441 Err_D: 1.2618
8 / 13] Time: 0.283 DataTime: 0.000 Err_G: 0.3661 Err_D: 1.5913
9 / 13] Time: 0.839 DataTime: 0.351 Err_G: 1.6691 Err_D: 1.8219
13] Time: 1.221 DataTime: 1.052 Err_G: 0.5908 Err_D: 1.5005
13] Time: 0.467 DataTime: 0.312 Err_G: 1.6038 Err_D: 1.1185
13] Time: 0.284 DataTime: 0.000 Err_G: 0.7994 Err_D: 1.1239
13] Time: 0.287 DataTime: 0.116 Err_G: 1.8182 Err_D: 1.0219
99 / 100 Time Taken: 8.841
0 / 13] Time: 0.271 DataTime: 0.000 Err_G: 0.9349 Err_D: 1.0812
1 / 13] Time: 0.174 DataTime: 0.000 Err_G: 1.3770 Err_D: 1.0310
2 / 13] Time: 0.282 DataTime: 0.000 Err_G: 1.0230 Err_D: 1.1099
3 / 13] Time: 0.283 DataTime: 0.000 Err_G: 0.9857 Err_D: 1.0808
4 / 13] Time: 1.084 DataTime: 0.927 Err_G: 0.9773 Err_D: 1.2150
5 / 13] Time: 0.374 DataTime: 0.217 Err_G: 1.2164 Err_D: 1.1224
6 / 13] Time: 0.286 DataTime: 0.000 Err_G: 0.7667 Err_D: 1.0097
7 / 13] Time: 0.282 DataTime: 0.000 Err_G: 2.4329 Err_D: 1.0925
8 / 13] Time: 1.605 DataTime: 1.450 Err_G: 0.6667 Err_D: 1.2033
9 / 13] Time: 0.618 DataTime: 0.000 Err_G: 2.2147 Err_D: 0.9510
13] Time: 0.335 DataTime: 0.165 Err_G: 0.6985 Err_D: 1.1703
13] Time: 1.274 DataTime: 1.118 Err_G: 1.7755 Err_D: 1.1486
13] Time: 0.303 DataTime: 0.145 Err_G: 0.8030 Err_D: 1.3872
13] Time: 0.349 DataTime: 0.191 Err_G: 1.0955 Err_D: 1.1228
100 / 100 Time Taken: 9.174

I think maybe I'm missing something?

RtFishers · 2017-10-25T07:47:55Z

Also... what happens if I run the command "DATA_ROOT=landscape dataset=folder ndf=50 ngf=150 th main.lua" AFTER I have already run it (choosing not to load from a checkpoint)? Does it just start the process over completely or does the file in the "cache" folder get involved?

robbiebarrat · 2017-10-25T08:23:54Z

delete the contents your cache folder - it builds the dataset into arrays to be used by the network, i think you are only training on a small portion of the dataset (the 13 number in the logs should be a lot larger - it's just your batches). Delete the contents of cache/, ensure that you have at least a few thousand landscapes in your landscapes/images folder, and begin training again... Let me know if that doesn't fix it.

RtFishers · 2017-10-25T08:27:57Z

How big should it be? I deleted the file and now it's at 19...

robbiebarrat · 2017-10-25T08:28:32Z

What's your batch size and number of images in your folder?

RtFishers · 2017-10-25T08:29:17Z

Maybe my dataset for the landscape images is too low? There are only 1262 or so images and the batch size is just as default - 64

RtFishers · 2017-10-25T08:31:37Z

Oh yes... just for clarification:
Torch7, Lua 5.2.4 (but I think my torch is using LuaJIT), Cuda 8.0, cudnn 5.1...

RtFishers · 2017-10-25T08:32:31Z

But I didn't install luarocks cudnn or cunn stuff cuz that made it run into errors when I ran the training code.

RtFishers · 2017-10-25T08:35:49Z

I think I'm gonna try with all 200,000+ images from celebA and see what happens... but I'm pretty sure I'm not getting the results as they are meant to be.

robbiebarrat · 2017-10-25T08:38:00Z

Hmm.... I doubt the cudnn has anything to do with it; although you may run into some errors loading from saved models (don't quote me on that)...

I think that the dataset size is the problem; I've gotten some very strange results when trying to train on data under ~3,000 images...

Let me know what you get with celebA - as that should definitely work. If it doesn't, send me your entire project folder (minus the dataset, maybe) on google drive or something, and I'll take a look myself.

Keep in mind that the project is currently undergoing a total overhaul in Keras, and is being reimplemented with a better model and in python/keras instead of torch, so if we're unable to solve your problems now, they shouldn't be an issue anymore in a week or two after the update.

RtFishers · 2017-10-25T08:42:28Z

Okay, great to hear :). I will report back shortly.

RtFishers · 2017-10-25T12:22:26Z

Okay it works great now!.. although, I had to adjust the layers from 50:150 to 20:120 - otherwise, my discriminator overpowers my generator every single time and it just remains in noise forever.

robbiebarrat · 2017-10-25T18:05:31Z

Ah - nice, glad you got it working. Also; yeah, that makes sense. GANs are really "hard to train" - meaning that if you don't set all the hyperparameters just right, it'll screw everything up.

That's actually the reason I wanted to see your logs, usually if the discriminator wins over the generator, d_loss goes down and hangs around 0.00001 or a similar low value... The discriminator does have the easier job, so it often wins over unless you severely handicap it's number of layers.

You might have to play around with different numbers of filters per network when moving to different datasets, because the 20:120 ratio might not work for all of them.

robbiebarrat closed this as completed Oct 24, 2017

robbiebarrat reopened this Oct 24, 2017

robbiebarrat closed this as completed Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100 epochs with 10,000 images from celebA... still noise? #4

100 epochs with 10,000 images from celebA... still noise? #4

RtFishers commented Oct 24, 2017 •

edited

Loading

robbiebarrat commented Oct 24, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017 •

edited

Loading

robbiebarrat commented Oct 25, 2017

100 epochs with 10,000 images from celebA... still noise? #4

100 epochs with 10,000 images from celebA... still noise? #4

Comments

RtFishers commented Oct 24, 2017 • edited Loading

robbiebarrat commented Oct 24, 2017 • edited Loading

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017 • edited Loading

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017 • edited Loading

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 25, 2017

RtFishers commented Oct 25, 2017 • edited Loading

robbiebarrat commented Oct 25, 2017

RtFishers commented Oct 24, 2017 •

edited

Loading

robbiebarrat commented Oct 24, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017 •

edited

Loading

RtFishers commented Oct 25, 2017 •

edited

Loading