-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More layers=Worse Result? #96
Comments
@mallorbc it would be helpful you could post some examples. Short of that, all I can say is that you'll need to decrease your learning rate by very small amounts until the image stabilizes. Having said that, a learning rate of 1e-5 (the default i believe) has worked just fine for me in 44 layers, so that's a bit strange. I'm not sure the model really is capable of converging on 64 layers either? I don't know, I've tried to find a stable learning rate for that many layers and failed. If you could post an example of the exact same prompt at 16, 24, 32 and finally, 44 layers, it would be very helpful. I personally think 64 layers is past the point of diminishing returns, for whatever reason that is. |
Check out this table I've made where you can see a quick example of layers/learning rate combination; If you mouse over the final image, the video of the training is played. Generated images are 416x416, which leaves some room for 24GB 3090 (lucky to have one too). |
That's a very nice overview! Regarding the question on how to optimally use large RAM sizes it at some point gets better to go into width and not depth. In #103 I just added the option to determine the Here you can see a shift in hidden size from 64 to 512, doubling each time per row. I trained for 3 epochs, with 44 layers and a batch size of 16. For a hidden size of 512 the pictures shown are from only 1/6 of the training duration - notice how they already look quite converged. If I would show the final images they would have diverged, a bit like in the lower right of the matrix above by @mroosen. |
I have been blessed to have been able to get an RTX 3090, and thus I can run this model with many layers and large batch size.
I have tried 64 layers, 44 layers, 32 layers, 16 layers etc. In the runs that I have done, it seems to me that at least for the 64 and 44 layers, that the produced results are actually worse than the lower number of layers. By worse, I mean less colorful and more blurry.
Is there a reason for this? Maybe it's due to the batch size? Any insight would be great.
The text was updated successfully, but these errors were encountered: