-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training issues and learning rates #128
Comments
same issue, have you handled this problem? |
No, I haven't tried again |
i have updated xformers from 0.0.16 to 0.0.17, then it works, maybe you can try this |
hi , whats your videos look like? same motion? |
Any update? |
Is it related to this issue? |
Hi! Thanks for releasing the models + the training code. That's a massive contribution !
I've tried to train the model either by finetuning the already released model or training from scratch but the result is always the same : the model starts collapsing and the frames produced during training are only noise.
Here are what I tested to prevent that :
--- Using gradient accumulation steps of 4 with BS1 : No really change
--- Using BS4 + gradient accumulation steps of 1 with Gradient checkpointing : Strangely the model didn't seem to learn ANYTHING when using gradient checkpointing
-- LR 1e-4 : Model collapse after only 40 steps
-- LR 1e-5 : Model collapse after around 100 steps
-- LR 1e-7 : Model collapse after 10K steps but it didn't learn anything
I haven't tried using the original dataset of videos, that would be my next test. Can it be because of the videos I used ? Something with FPS or anything ?
Has anyone else managed to train from scratch or finetune ? If yes, what LR did you use ? And what other params have you changed from the training.yaml file ?
Thanks
The text was updated successfully, but these errors were encountered: