More layers=Worse Result? #96

mallorbc · 2021-03-16T07:15:20Z

I have been blessed to have been able to get an RTX 3090, and thus I can run this model with many layers and large batch size.

I have tried 64 layers, 44 layers, 32 layers, 16 layers etc. In the runs that I have done, it seems to me that at least for the 64 and 44 layers, that the produced results are actually worse than the lower number of layers. By worse, I mean less colorful and more blurry.

Is there a reason for this? Maybe it's due to the batch size? Any insight would be great.

afiaka87 · 2021-03-16T14:04:32Z

@mallorbc it would be helpful you could post some examples. Short of that, all I can say is that you'll need to decrease your learning rate by very small amounts until the image stabilizes. Having said that, a learning rate of 1e-5 (the default i believe) has worked just fine for me in 44 layers, so that's a bit strange.

I'm not sure the model really is capable of converging on 64 layers either? I don't know, I've tried to find a stable learning rate for that many layers and failed.

If you could post an example of the exact same prompt at 16, 24, 32 and finally, 44 layers, it would be very helpful. I personally think 64 layers is past the point of diminishing returns, for whatever reason that is.

mroosen · 2021-03-18T21:05:23Z

Check out this table I've made where you can see a quick example of layers/learning rate combination;

If you mouse over the final image, the video of the training is played.

Generated images are 416x416, which leaves some room for 24GB 3090 (lucky to have one too).

https://mroosen.github.io/deep-daze-dreams/

NotNANtoN · 2021-03-26T13:27:42Z

That's a very nice overview! Regarding the question on how to optimally use large RAM sizes it at some point gets better to go into width and not depth. In #103 I just added the option to determine the hidden_size in the CLI. It was previously fixed at 256. The results with higher numbers are much more colorful and converge quicker but also diverge quickly - I think when increasing the hidden_size from 256 to 512 the learning rate should probably be halved but I have not experimented extensively with it.

Here you can see a shift in hidden size from 64 to 512, doubling each time per row. I trained for 3 epochs, with 44 layers and a batch size of 16. For a hidden size of 512 the pictures shown are from only 1/6 of the training duration - notice how they already look quite converged. If I would show the final images they would have diverged, a bit like in the lower right of the matrix above by @mroosen.

russelldc mentioned this issue Mar 23, 2021

Some new augmentations #103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More layers=Worse Result? #96

More layers=Worse Result? #96

mallorbc commented Mar 16, 2021

afiaka87 commented Mar 16, 2021

mroosen commented Mar 18, 2021 •

edited

Loading

NotNANtoN commented Mar 26, 2021

More layers=Worse Result? #96

More layers=Worse Result? #96

Comments

mallorbc commented Mar 16, 2021

afiaka87 commented Mar 16, 2021

mroosen commented Mar 18, 2021 • edited Loading

NotNANtoN commented Mar 26, 2021

mroosen commented Mar 18, 2021 •

edited

Loading