-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended Prodigy Settings (low steps per epoch) #8
Comments
Hi, I'm glad you had a positive experience with Prodigy!
I hope this helps, but let us know if you have any other questions. |
Thank you for your reply. For # 4 it seems in my testing for a diverse set there's no need, but for a small set there is still benefits to repeats, but less so than other less dynamic optimizers. I'll check out polynomial vs T_max=steps today. |
I'm going to assume for the sake of this post that you are training LoRA within Kohya, since that is the most common. As it pertains to your concerns about T_max: if you use the "cosine" scheduler setting in Kohya, all of that is handled for you. You don't need to pass it any additional arguments, it'll do the math and set the LR schedule appropriately. On the matter of repeats: You don't need them unless you're training multiple concepts or using regularization images. Without that, one repeat is going to be largely equivalent to one epoch. For better ease of tracking your training, it's best not to use repeats at all. On the matter of "nailing the final 10-20%": I was dealing with a dataset that had similar resistance to learning the finer details as well. Full disclosure, I am not technically competent enough to anticipate if this has non-obvious adverse affects on the behavior of prodigy, but what worked for me in getting the learning process to conclude properly (where adjusting d_coef, network dimensions, all manners of step counts, schedulers, and dataset images failed) was actually to adjust beta2 and the weight decay. In particular, the parameters that worked for me were betas of (0.9, 0.99), weight_decay of .1, and batch size of 5 over about 1000-2000 steps. Everything else was left to defaults. I did not enable bias correction, but I am not speaking affirmatively or negatively on that because I haven't gone back to test that yet. If you are still struggling, you may want to give these settings a try and see what works for you. I suspect the most important parts were the lowered beta2 (which, as far as I can tell, should improve "remembering" details from previous steps) and raised weight decay. I'm sorry if this kind of discussion is not suited for the issues page of the optimizer, but I hope my personal observations on training diffusion models may help. |
Wow. Your betas suggestion is a dramatic improvement, thank you. |
|
@konstmish Might I suggest including some of this information in your README? It answers a lot of questions I had that the README wasn't able to answer for me. |
@umarbutler |
Hi @madman404 , thanks for your important tip! Have you tried to use (0.9, 0.99) as betas for AdamW as well? Thanks. |
Hi, I'm using Prodigy to train stable diffusion loras and I'm amazed at how resistant it is to overtraining, but have had a hard time nailing the final 10-20% I need to properly say it's trained. I have a few questions about schedulers and and their args.
Thank you, I'd appreciate any advice if you're familiar with my questions.
The text was updated successfully, but these errors were encountered: