Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More layers=Worse Result? #96

Open
mallorbc opened this issue Mar 16, 2021 · 3 comments
Open

More layers=Worse Result? #96

mallorbc opened this issue Mar 16, 2021 · 3 comments

Comments

@mallorbc
Copy link

I have been blessed to have been able to get an RTX 3090, and thus I can run this model with many layers and large batch size.

I have tried 64 layers, 44 layers, 32 layers, 16 layers etc. In the runs that I have done, it seems to me that at least for the 64 and 44 layers, that the produced results are actually worse than the lower number of layers. By worse, I mean less colorful and more blurry.

Is there a reason for this? Maybe it's due to the batch size? Any insight would be great.

@afiaka87
Copy link

@mallorbc it would be helpful you could post some examples. Short of that, all I can say is that you'll need to decrease your learning rate by very small amounts until the image stabilizes. Having said that, a learning rate of 1e-5 (the default i believe) has worked just fine for me in 44 layers, so that's a bit strange.

I'm not sure the model really is capable of converging on 64 layers either? I don't know, I've tried to find a stable learning rate for that many layers and failed.

If you could post an example of the exact same prompt at 16, 24, 32 and finally, 44 layers, it would be very helpful. I personally think 64 layers is past the point of diminishing returns, for whatever reason that is.

@mroosen
Copy link

mroosen commented Mar 18, 2021

Check out this table I've made where you can see a quick example of layers/learning rate combination;

If you mouse over the final image, the video of the training is played.

Generated images are 416x416, which leaves some room for 24GB 3090 (lucky to have one too).

https://mroosen.github.io/deep-daze-dreams/
ss

@NotNANtoN
Copy link
Contributor

That's a very nice overview! Regarding the question on how to optimally use large RAM sizes it at some point gets better to go into width and not depth. In #103 I just added the option to determine the hidden_size in the CLI. It was previously fixed at 256. The results with higher numbers are much more colorful and converge quicker but also diverge quickly - I think when increasing the hidden_size from 256 to 512 the learning rate should probably be halved but I have not experimented extensively with it.

Here you can see a shift in hidden size from 64 to 512, doubling each time per row. I trained for 3 epochs, with 44 layers and a batch size of 16. For a hidden size of 512 the pictures shown are from only 1/6 of the training duration - notice how they already look quite converged. If I would show the final images they would have diverged, a bit like in the lower right of the matrix above by @mroosen.

deepdaze_hidden_size_64_to_512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants